Computer-directed assembly of a polynucleotide encoding a target polypeptide

ABSTRACT

The present invention outlines a novel approach to utilizing the results of genomic sequence information by computer-directed polynucleotide assembly based upon information available in databases such as the human genome database. Specifically, the present invention may be used to select, synthesize and assemble a novel, synthetic target polynucleotide sequence encoding a target polypeptide. The target polynucleotide may encode a target polypeptide that exhibits enhanced or altered biological activity as compared to a model polypeptide encoded by a natural (wild-type) or model polynucleotide sequence.

This application is based on, and claims the benefit of, U.S.Provisional Application No. 60/262,693, filed Jan. 19, 2001, andentitled COMPUTER-DIRECTED ASSEMBLY OF A POLYNUCLEOTIDE ENCODING ATARGET POLYPEPTIDE, and which is incorporated herein by reference.

TECHNICAL FIELD

The present invention relates generally to the area of bioinformaticsand more specifically to methods, algorithms and apparatus for computerdirected polynucleotide assembly. The invention further relates to theproduction of polypeptides encoded by polynucleotides assembled by theinvention.

BACKGROUND

Enzymes, antibodies, receptors and ligands are polypeptides that haveevolved by selective pressure to perform very specific biologicalfunctions within the milieu of a living organism. The use of apolypeptide for specific technological applications may require thepolypeptide to function in environments or on substrates for which itwas not evolutionarily selected. Polypeptides isolated frommicroorganisms that thrive in extreme environments provide ampleevidence that these molecules are, in general, malleable with regard tostructure and function. However, the process for isolating a polypeptidefrom its native environment is expensive and time consuming. Thus, newmethods for synthetically evolving genetic material encoding apolypeptide possessing a desired activity are needed.

There are two ways to obtain genetic material for genetic engineeringmanipulations: (1) isolation and purification of a polynucleotide in theform of DNA or RNA from natural sources or (2) the synthesis of apolynucleotide using various chemical-enzymatic approaches. The formerapproach is limited to naturally-occurring sequences that do not easilylend themselves to specific modification. The latter approach is muchmore complicated and labor-intensive. However, the chemical-enzymaticapproach has many attractive features including the possibility ofpreparing, without any significant limitations, any desirablepolynucleotide sequence.

Two general methods currently exist for the synthetic assembly ofoligonucleotides into long polynucleotide fragments. First,oligonucleotides covering the entire sequence to be synthesized arefirst allowed to anneal, and then the nicks are repaired with ligase.The fragment is then cloned directly, or cloned after amplification bythe polymerase chain reaction (PCR). The polynucleotide is subsequentlyused for in vitro assembly into longer sequences. The second generalmethod for gene synthesis utilizes polymerase to fill in single-strandedgaps in the annealed pairs of oligonucleotides. After the polymerasereaction, single-stranded regions of oligonucleotides becomedouble-stranded, and after digestion with restriction endonuclease, canbe cloned directly or used for further assembly of longer sequences byligating different double-stranded fragments. Typically, subsequent tothe polymerase reaction, each segment must be cloned which significantlydelays the synthesis of long DNA fragments and greatly decreases theefficiency of this approach.

The creation of entirely novel polynucleotides, or the substantialmodification of existing polynucleotides, is extremely time consuming,expensive, requires complex and multiple steps, and in some cases isimpossible. Therefore, there exists a great need for an efficient meansto assemble synthetic polynucleotides of any desired sequence. Such amethod could be universally applied. For example, the method could beused to efficiently make an array of polynucleotides having specificsubstitutions in a known sequence that is expressed and screened forimproved function. The present invention satisfies these needs byproviding efficient and powerful methods and compositions for thesynthesis of a target polynucleotide encoding a target polypeptide.

SUMMARY

The present invention addresses the limitations in present recombinantnucleic acid manipulations by providing a fast, efficient means forgenerating a nucleic acid sequence, including entire genes, chromosomalsegments, chromosomes and genomes. Because this approach is based on acompletely synthetic approach, there are no limitations, such as theavailability of existing nucleic acids, to hinder the construction ofeven very large segments of nucleic acid.

In one embodiment, the invention provides a method of synthesizing atarget polynucleotide sequence including; a) providing a targetpolynucleotide sequence; b) identifying at least one initiatingpolynucleotide present in the target polynucleotide which includes atleast one plus strand oligonucleotide annealed to at least one minusstrand oligonucleotide resulting in a partially double-strandedpolynucleotide comprised of a 5′ overhang and a 3′ overhang; c)identifying a second polynucleotide present in the target polynucleotidewhich is contiguous with the initiating polynucleotide and includes atleast one plus strand oligonucleotide annealed to at least one minusstrand oligonucleotide resulting in a partially double-strandedpolynucleotide comprised of a 5′ overhang, a 3′ overhang, or a 5′overhang and a 3′ overhang, where at least one overhang of the secondpolynucleotide is complementary to at least one overhang of theinitiating polynucleotide; d) identifying a third polynucleotide presentin the target polynucleotide which is contiguous with the initiatingsequence and includes at least one plus strand oligonucleotide annealedto at least one minus strand oligonucleotide resulting in a partiallydouble-stranded polynucleotide comprised of a 5′ overhang, a 3′overhang, or a 5′ overhang and a 3′ overhang, where at least oneoverhang of the third polynucleotide is complementary to at least oneoverhang of the initiating polynucleotide which is not complementary toan overhang of the second polynucleotide; e) contacting the initiatingpolynucleotide with the second polynucleotide and the thirdpolynucleotide under conditions and for such time suitable forannealing, the contacting resulting in a contiguous double-strandedpolynucleotide, resulting in the bi-directional extension of theinitiating polynucleotide; f) in the absence of primer extension,optionally contacting the mixture of e) with a ligase under conditionssuitable for ligation; and g) optionally repeating b) through f) tosequentially add double-stranded polynucleotides to the extendedinitiating polynucleotide through repeated cycles of annealing andligation, whereby a target polynucleotide is synthesized.

The invention further provides a method of synthesizing a targetpolynucleotide including: a) providing a target polynucleotide sequencederived from a model sequence; b) identifying at least one initiatingpolynucleotide sequence present in the target polynucleotide sequence ofa), wherein the initiating polynucleotide including: 1) a first plusstrand oligonucleotide; 2) a second plus strand oligonucleotidecontiguous with the first plus strand oligonucleotide; and 3) a minusstrand oligonucleotide including a first contiguous sequence which is atleast partially complementary to the first plus strand oligonucleotideand second contiguous sequence which is at least partially complementaryto the second plus strand oligonucleotide; c) annealing the first plusstrand oligonucleotide and the second plus strand oligonucleotide to theminus strand oligonucleotide of b) resulting in a partiallydouble-stranded initiating polynucleotide including a 5′ overhang and a3′ overhang; d) identifying a second polynucleotide sequence present inthe target polynucleotide sequence of a), wherein the secondpolynucleotide sequence is contiguous with the initiating polynucleotidesequence and includes: 1) a first plus strand oligonucleotide; 2) asecond plus strand oligonucleotide contiguous with the first plus strandoligonucleotide; and 3) a minus strand oligonucleotide comprising afirst contiguous sequence which is at least partially complementary tothe first plus strand oligonucleotide and second contiguous sequencewhich is at least partially complementary to the second plus strandoligonucleotide; e) annealing the first plus strand oligonucleotide andthe second plus strand oligonucleotide to the minus strandoligonucleotide of d) resulting in a partially double-stranded secondpolynucleotide, wherein at least one overhang of the secondpolynucleotide is complementary to at least one overhang of theinitiating polynucleotide; f) identifying a third polynucleotide presentin the target polynucleotide of a)., wherein the third polynucleotide iscontiguous with the initiating sequence and comprises: 1) a first plusstrand oligonucleotide; 2) a second plus strand oligonucleotidecontiguous with the first plus strand oligonucleotide; and 3) a minusstrand oligonucleotide comprising a first contiguous sequence which isat least partially complementary to the first plus strandoligonucleotide and second contiguous sequence which is at leastpartially complementary to the second plus strand oligonucleotide; g)annealing the first plus strand oligonucleotide and the second plusstrand oligonucleotide to the minus strand oligonucleotide of f)resulting in a partially double-stranded second polynucleotide, whereinat least one overhang of the third polynucleotide is complementary to atleast one overhang of the initiating polynucleotide and notcomplementary to an overhang of the second polynucleotide; h) contactingthe initiating polynucleotide of c) with the second polynucleotide of e)and the third polynucleotide of g) under conditions and for such timesuitable for annealing, the contacting resulting in a contiguousdouble-stranded polynucleotide, wherein the initiating sequence isextended bi-directionally; i) in the absence of primer extension,optionally contacting the mixture of h) with a ligase under conditionssuitable for ligation; and j) optionally repeating b) through i) tosequentially add double-stranded polynucleotides to the extendedinitiating polynucleotide through repeated cycles of annealing andligation, whereby a target polynucleotide is synthesized.

In another embodiment, the invention provides a method a method forsynthesizing a target polynucleotide, including; a) providing a targetpolynucleotide sequence derived from a model sequence; b) identifying atleast one initiating polynucleotide present in the target polynucleotidewhich includes at least one plus strand oligonucleotide annealed to atleast one minus strand oligonucleotide; c) contacting the initiatingpolynucleotide under conditions suitable for primer annealing with afirst oligonucleotide having partial complementarity to the 3′ portionof the plus strand of the initiating polynucleotide, and a secondoligonucleotide having partial complementarity to the 3′ portion of theminus strand of the initiating polynucleotide; d) catalyzing underconditions suitable for primer extension: 1) polynucleotide synthesisfrom the 3′-hydroxyl of the plus strand of the initiatingpolynucleotide; 2) polynucleotide synthesis from the 3′-hydroxyl of theannealed first oligonucleotide; 3) polynucleotide synthesis from the3′-hydroxyl of the minus strand of the initiating polynucleotide; and 4)polynucleotide synthesis from the 3′-hydroxyl of the annealed secondoligonucleotide, resulting in the bi-directional extension of theinitiating sequence thereby forming a nascent extended initiatingpolynucleotide; e) contacting the extended initiating polynucleotide ofd) under conditions suitable for primer annealing with a thirdoligonucleotide having partial complementarity to the 3′ portion of theplus strand of the extended initiating polynucleotide, and a fourtholigonucleotide having partial complementarity to the 3′ portion of theminus strand of the extended initiating polynucleotide; f) catalyzingunder conditions suitable for primer extension: 1) polynucleotidesynthesis from the 3′-hydroxyl of the plus strand of the extendedinitiating polynucleotide; 2) polynucleotide synthesis from the3′-hydroxyl of the annealed third oligonucleotide; 3) polynucleotidesynthesis from the 3′-hydroxyl of the minus strand of the extendedinitiating polynucleotide; and 4) polynucleotide synthesis from the3′-hydroxyl of the annealed fourth oligonucleotide, resulting in thebi-directional extension of the initiating sequence thereby forming anascent extended initiating polynucleotide; and g) optionally repeatinge) through f) as desired, resulting in formation of the targetpolynucleotide sequence.

The invention further provides a method for isolating a targetpolypeptide encoded by a target polynucleotide generated by a method ofthe invention by; a) incorporating the target polynucleotide in anexpression vector; b) introducing the expression vector into a suitablehost cell; c) culturing the cell under conditions and for such time asto promote the expression of the target polypeptide encoded by thetarget polynucleotide; and d) isolating the target polypeptide.

The invention further provides a method of synthesizing a targetpolynucleotide including; a) providing a target polynucleotide sequencederived from a model sequence; b) chemically synthesizing a plurality ofsingle-stranded oligonucleotides each of which is partiallycomplementary to at least one oligonucleotide present in the plurality,where the sequence of the plurality of oligonucleotides is a contiguoussequence of the target polynucleotide; c) contacting the partiallycomplementary oligonucleotides under conditions and for such timesuitable for annealing, the contacting resulting in a plurality ofpartially double-stranded polynucleotides, where each double-strandedpolynucleotide includes a 5′ overhang and a 3′ overhang; d) identifyingat least one initiating polynucleotide derived from the model sequencepresent in the plurality of double-stranded polynucleotides; e) in theabsence of primer extension, subjecting a mixture including theinitiating polynucleotide and 1) a double-stranded polynucleotide thatwill anneal to the 5′ portion of said initiating and sequence; 2) adouble-stranded polynucleotide that will anneal to the 3′ portion of theinitiating polynucleotide; and 3) a DNA ligase under conditions suitablefor annealing and ligation, wherein the initiating polynucleotide isextended bi-directionally; f) sequentially annealing double-strandedpolynucleotides to the extended initiating polynucleotide throughrepeated cycles of annealing, whereby the target polynucleotide isproduced.

The invention further provides a computer program, stored on acomputer-readable medium, for generating a target polynucleotidesequence derived from a model sequence, the computer program comprisinginstructions for causing a computer system to: a) identify an initiatingpolynucleotide sequence contained in the target polynucleotide sequence;b) parse the target polynucleotide sequence into multiply distinct,partially complementary, oligonucleotides; c) control assembly of thetarget polynucleotide sequence by controlling the bi-directionalextension of the initiating polynucleotide sequence by the sequentialaddition of partially complementary oligonucleotides resulting in acontiguous double-stranded polynucleotide.

The invention further provides a method for automated synthesis of atarget polynucleotide sequence, including: a) providing the user with anopportunity to communicate a desired target polynucleotide sequence; b)allowing the user to transmit the desired target polynucleotide sequenceto a server; c) providing the user with a unique designation; d)obtaining the transmitted target polynucleotide sequence provided by theuser.

The invention further provides a method for automated synthesis of apolynucleotide sequence, including: a) providing a user with a mechanismfor communicating a model polynucleotide sequence; b) optionallyproviding the user with an opportunity to communicate at least onedesired modification to the model sequence if desired; c) allowing theuser to transmit the model sequence and desired modification to aserver; d) providing user with a unique designation; e) obtaining thetransmitted model sequence and optional desired modification provided bythe user; f) inputting into a programmed computer, through an inputdevice, data including at least a portion of the model polynucleotidesequence; g) determining, using the processor, the sequence of the modelpolynucleotide sequence containing the desired modification; h) furtherdetermining, using the processor, at least one initiating polynucleotidesequence present in the model polynucleotide sequence; i) selecting,using the processor, a model for synthesizing the modified modelpolynucleotide sequence based on the position of the initiating sequencein the model polynucleotide sequence; and j) outputting, to the outputdevice, the results of the at least one determination.

Unless otherwise defined, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. For example, the one letter andthree letter abbreviations for amino acids and the one-letterabbreviations for nucleotides are commonly understood. Although methodsand materials similar or equivalent to those described herein can beused in the practice or testing of the present invention, suitablemethods and materials are described below. In addition, the materials,methods and examples are illustrative only and not intended to belimiting. All publications, patent applications, patents, and otherreferences mentioned herein are incorporated by reference in theirentirety. In case of conflict, the present specification, includingdefinitions, will control.

The details of one or more embodiments of the invention are set forth inthe accompanying drawings and the description below. Other features,objects, and advantages of the invention will be apparent from thedescription and drawings, and from the claims.

DESCRIPTION OF DRAWINGS

Like reference symbols in the various drawings indicate like elements.

FIG. 1 depicts 96 well plates for of F (i.e., “forward” or “plusstrand”) oligonucleotide synthesis, R (i.e., “reverse” or “minusstrand”) oligonucleotide synthesis, and a T (i.e., “temperature”) platefor the annealing of F and T oligonucleotides.

FIG. 2 depicts the oligonucleotide pooling plan where F oligonucleotidesand R oligonucleotides are annealed to form a contiguous polynucleotide.

FIG. 3 depicts the schematic of assembly of a target polynucleotidesequence defining a gene, genome, set of genes or polypeptide sequence.The sequence is designed by computer and used to generate a set ofparsed oligonucleotide fragments covering the + and − strand of a targetpolynucleotide sequence encoding a target polypeptide.

FIG. 4 depicts a schematic of the polynuceotide synthesis modules. Ananodispensing head with a plurality of valves will deposit synthesischemicals in assembly vessels. Chemical distribution from the reagentreservoir can be controlled using a syringe pump. Underlying thereaction chambers is a set of assembly vessels linked to microchannelsthat will move fluids by microfluidics.

FIG. 5 depicts that oligonucleotide synthesis, oligonucleotide assemblyby pooling and annealing, and ligation can be accomplished usingmicrofluidic mixing.

FIG. 6 depicts the sequential pooling of oligonucleotides synthesized inarrays.

FIG. 7 depicts the pooling stage of the oligonucleotide componentsthrough the manifold assemblies resulting in the complete assembly ofall oligonucleotides from the array.

FIG. 8 depicts an example of an assembly module comprising a completeset of pooling manifolds produced using microfabrication in a singleunit. Various configurations of the pooling manifold will allow assemblyof increased numbers of well arrays of parsed componentoligonucleotides.

FIG. 9 depicts the configuration for the assembly of oligonucleotidessynthesized in a pre-defined array. Passage through the assembly devicein the presence of DNA ligase and other appropriate buffer and chemicalcomponents will facilitate double stranded polynucleotide assembly.

FIG. 10 depicts an example of the pooling device design. Microgrooves ormicrofluidic channels are etched into the surface of the pooling device.The device provides a microreaction vessel at the junction of twochannels for 1) mixing of the two streams, 2) controlled temperaturemaintenance or cycling a the site of the junction and 3) expulsion ofthe ligated mixture from the exit channel into the next set of poolingand ligation chambers.

FIG. 11 depicts the design of a polynucleotide synthesis platformcomprising microwell plates addressed with a plurality of channels formicrodispensing.

FIG. 12 depicts an example of a high capacity polynucleotide synthesisplatform using high density microwell microplates capable ofsynthesizing in excess of 1536 component oligonucleotides per plate.

FIG. 13 depicts a polynucleotide assembly format using surface-boundoligonucleotide synthesis rather than soluble synthesis. In thisconfiguration, oligonucleotides are synthesized with a linker thatallows attachment to a solid support.

FIG. 14 depicts a diagram of systematic polynucleotide assembly on asolid support. A set of parsed component oligonucleotides are arrangedin an array with a stabilizer oligonucletoide attached. A set ofligation substrate oligonucleotides are placed in the solution andsystematic assembly is carried out in the solid phase by sequentialannealing, ligation and melting.

FIG. 15 depicts polynucleotide assembly using component oligonucleotidesbound to a set of metal electrodes on a microelectronic chip. Eachelectrode can be controlled independently with respect to current andvoltage.

FIG. 16 depicts generally a primer extension assembly method of theinvention.

FIG. 17 provides a system diagram of the invention.

FIG. 18 depicts a perspective view of an instrument of the invention.

DETAILED DESCRIPTION

The complete sequence of complex genomes, including the human genome,make large scale functional approaches to genetics possible. The presentinvention outlines a novel approach to utilizing the results of genomicsequence information by computer-directed polynucleotide assembly basedupon information available in databases such as the human genomedatabase. Specifically, the present invention may be used to synthesize,assemble and select a novel, synthetic target polynucleotide sequenceencoding a target polypeptide. The target polynucleotide may encode atarget polypeptide that exhibits enhanced or altered biological activityas compared to a model polypeptide encoded by a natural (wild-type) ormodel polynucleotide sequence. Subsequently, standard assays may be usedto survey the activity of an expressed target polypeptide. For example,the expressed target polypeptide can be assayed to determine its abilityto carry out the function of the corresponding model polypeptide or todetermine whether a target polypeptide exhibiting a new function hasbeen produced. Thus, the present invention provides a means for thesynthetically evolving a model polypeptide by synthesizing, in acomputer-directed fashion, polynucleotides encoding a target polypeptidederived from a model polypeptide.

In one embodiment, the invention provides a method of synthesizing atarget polynucleotide by providing a target polynucleotide sequence andidentifying at least one initiating polynucleotide present in the targetpolynucleotide which includes at least one plus strand oligonucleotideannealed to at least one minus strand oligonucleotide resulting in apartially double-stranded polynucleotide comprised of a 5′ overhang anda 3′ overhang. As used herein, a “target polynucleotide sequence”includes any nucleic acid sequence suitable for encoding a targetpolypeptide that can be synthesized by a method of the invention. Atarget polynucleotide sequence can be used to generate a targetpolynucleotide using an apparatus capable of assembling nucleicsequences. Generally, a target polynucleotide sequence is a linearsegment of DNA having a double-stranded region; the segment may be ofany length sufficiently long to be created by the hybridization of atleast two oligonucleotides have complementary regions. It iscontemplated that a target polynucleotide can be 100, 200, 300, 400,800, 100, 1500, 200, 4000, 8000, 10000, 12000, 18,000, 20,000, 40,000,80,000 or more base pairs in length. Indeed, it is contemplated that themethods of the present invention will be able to create entireartificial genomes of lengths comparable to known bacterial, yeast,viral, mammalian, amphibian, reptilian, or avian genomes. In moreparticular embodiments, the target polynucleotide is a gene encoding apolypeptide of interest. The target polynucleotide may further includenon-coding elements such as origins of replication, telomeres,promoters, enhancers, transcription and translation start and stopsignals, introns, exon splice sites, chromatin scaffold components andother regulatory sequences. The target polynucleotide may comprisesmultiple genes, chromosomal segments, chromosomes and even entiregenomes. A polynucleotide of the invention may be derived fromprokaryotic or eukaryotic sequences including bacterial, yeast, viral,mammalian, amphibian, reptilian, avian, plants, archebacteria and otherDNA containing living organisms.

An “oligonucleotide”, as used herein, is defined as a molecule comprisedof two or more deoxyribonucleotides or ribonucleotides, preferably morethan three. Its exact size will depend on many factors, such as thereaction temperature, salt concentration, the presence of denaturantssuch as formamide, and the degree of complementarity with the sequenceto which the oligonucleotide is intended to hybridize.

The term “nucleotide” as used herein can refer to nucleotides present ineither DNA or RNA and thus includes nucleotides which incorporateadenine, cytosine, guanine, thymine and uracil as base, the sugar moietybeing deoxyribose or ribose. It will be appreciated however that othermodified bases capable of base pairing with one of the conventionalbases, adenine, cytosine, guanine, thymine and uracil, may be used in anoligonucleotide employed in the present invention. Such modified basesinclude for example 8-azaguanine and hypoxanthine. If desired thenucleotides may carry a label or marker so that on incorporation into aprimer extension product, they augment the signal associated with theprimer extension product, for example for capture on to solid phase.

A “plus strand” oligonucleotide, by convention, includes a short,single-stranded DNA segment that starts with the 5′ end to the left asone reads the sequence. A “minus strand” oligonucleotide includes ashort, single-stranded DNA segment that starts with the 3′ end to theleft as one reads the sequence. Methods of synthesizing oligonucleotidesare found in, for example, Oligonucleotide Synthesis: A PracticalApproach, Gate, ed., IRL Press, Oxford (1984), incorporated herein byreference in its entirety. Solid-phase synthesis techniques have beenprovided for the synthesis of several peptide sequences on, for example,a number of “pins” (See e.g., Geysen et al., J. Immun. Meth. (1987)102:259-274, incorporated herein by reference in its entirety).

Additional methods of forming large arrays of oligonucleotides and otherpolymer sequences in a short period of time have been devised. Ofparticular note, Pirrung et al., U.S. Pat. No. 5,143,854 (see also PCTApplication No. WO 90/15070), Fodor et al., PCT Publication No. WO92/10092 and Winkler et al., U.S. Pat. No. 6,136,269, all incorporatedherein by reference, disclose methods of forming vast arrays of polymersequences using, for example, light-directed synthesis techniques. Seealso, Fodor et al., Science (1991) 251:767-777, also incorporated hereinby reference in its entirety. Some work has been done to automatesynthesis of polymer arrays. For example, Southern, PCT Application No.WO 89/10977, describes the use of a conventional pen plotter to depositthree different monomers at twelve distinct locations on a substrate.

An “initiating polynucleotide sequence,” as used herein, is a sequencecontained in a target polynucleotide sequence and identified by analgorithm of the invention. An “initiating polynucleotide” is thephysical embodiment of an initiating polynucleotide sequence. Forligation assembly of a target polynucleotide, an initiatingpolynucleotide begins assembly by providing an anchor for hybridizationof subsequent polynucleotides contiguous with the initiatingpolynucleotide. Thus, for ligation assembly, an initiatingpolynucleotide is partially double-stranded nucleic acid therebyproviding single-stranded overhang(s) for annealing of a contiguous,double-stranded nucleic acid molecule. For primer extension assembly ofa target polynucleotide, an initiating polynucleotide begins assembly byproviding a template for hybridization of subsequent oligonucleotidescontiguous with the initiating polynucleotide. Thus, for primerextension assembly, an initiating polynucleotide can be partiallydouble-stranded or fully double-stranded.

In one embodiment, an initiating polynucleotide of the invention can bebound to a solid support for improved efficiency. The solid phase allowsfor the efficient separation of the assembled target polynucleotide fromother components of the reaction. Different supports can be applied inthe method. For example, supports can be magnetic latex beads ormagnetic control pore glass beads that allows the desirable product fromthe reaction mixture to be magnetically separated. Binding theinitiating polynucleotide to such beads can be accomplished by a varietyof known methods, for example carbodiimide treatment (Gilham,Biochemistry 7:2809-2813 (1968); Mizutani and Tachbana, J.Chromatography 356:202-205 (1986); Wolf et al., Nucleic Acids Res.15:2911-2926 (1987); Musso, Nucleic Acids Res. 15:5353-5372 (1987); Lundet al., Nucleic Acids Res. 16:10861-10880 (1988)).

The initiating polynucleotide attached to the solid phase can act as ananchor for the continued synthesis of the target polynucleotide.Assembly can be accomplished by addition of contiguous polynucleotidestogether with ligase for ligation assembly or by addition ofoligonucleotides together with polymerase for primer extension assembly.After the appropriate incubation time, unbound components of the methodcan be washed out and the reaction can be repeated again to improve theefficiency of template utilization. Alternatively, another set ofpolynucleotides or oligonucleotides can be added to continue theassembly.

Solid phase, to be efficiently used for the synthesis, can contain poreswith sufficient room for synthesis of the long nucleic acid molecules.The solid phase can be composed of material that cannot non-specificallybind any undesired components of the reaction. One way to solve theproblem is to use control pore glass beads appropriate for long DNAmolecules. The initiating polynucleotide can be attached to the beadsthrough a long connector. The role of the connector is to position theinitiating polynucleotide from the surface of the solid support at adesirable distance.

The method of the invention further includes identifying a secondpolynucleotide sequence present in the target polynucleotide which iscontiguous with the initiating polynucleotide and includes at least oneplus strand oligonucleotide annealed to at least one minus strandoligonucleotide resulting in a partially double-stranded polynucleotidecomprised of a 5′ overhang, a 3′ overhang, or a 5′ overhang and a 3′overhang, where at least one overhang of the second polynucleotide iscomplementary to at least one overhang of the initiating polynucleotide.Two or more oligonucleotides having complementary regions, where theyare permitted, will “anneal” (i.e., base pair) under the appropriateconditions, thereby producing a double-stranded region. In order toanneal (i.e., hybridize), oligonucleotides must be at least partiallycomplementary. The term “complementary to” is used herein in relation tonucleotides to mean a nucleotide that will base pair with anotherspecific nucleotide. Thus adenosine triphosphate is complementary touridine triphosphate or thymidine triphosphate and guanosinetriphosphate is complementary to cytidine triphosphate.

As used herein, a 5′ or 3′ “overhang” means a region on the 5′ or 3′, or5′ and 3′, end of a polynucleotide that is single-stranded, i.e. notbase paired. An overhang provides a means for the subsequent annealingof a contiguous polynucleotide containing an overhang that iscomplementary to the overhang of the contiguous polynucleotide.Depending on the application envisioned, one will desire to employvarying conditions of annealing to achieve varying degrees of annealingselectivity.

For applications requiring high selectivity, one typically will desireto employ relatively stringent conditions to form the hybrids, e.g., onewill select relatively low salt and/or high temperature conditions, suchas provided by about 0.02 M to about 0.10 M NaCl at temperatures ofabout 50° C. to about 70° C. Such high stringency conditions toleratelittle, if any, mismatch between the oligonucleotide and the template ortarget strand. It generally is appreciated that conditions can berendered more stringent by the addition of increasing amounts offormamide.

For certain applications, for example, by analogy to substitution ofnucleotides by site-directed mutagenesis, it is appreciated that lowerstringency conditions may be used. Under these conditions, hybridizationmay occur even though the sequences of probe and target strand are notperfectly complementary, but are mismatched at one or more positions.Conditions may be rendered less stringent by increasing saltconcentration and decreasing temperature. For example, a mediumstringency condition could be provided by about 0.1 to 0.25 M NaCl attemperatures of about 37° C. to about 55° C., while a low stringencycondition could be provided by about 0.15 M to about 0.9 M salt, attemperatures ranging from about 20° C. to about 55° C. Thus,hybridization conditions can be readily manipulated depending on thedesired results.

In certain embodiments, it will be advantageous to determine thehybridization of oligonucleotides by employing a label. A wide varietyof appropriate labels are known in the art, including fluorescent,radioactive, enzymatic or other ligands, such as avidin/biotin, whichare capable of being detected. In preferred embodiments, one may desireto employ a fluorescent label or an enzyme tag such as urease, alkalinephosphatase or peroxidase, instead of radioactive or otherenvironmentally undesirable reagents. In the case of enzyme tags,calorimetric indicator substrates are known that can be employed toprovide a means for detection visible to the human eye orspectrophotometrically to identify whether specific hybridization withcomplementary oligonucleotide has occurred.

In embodiments involving a solid phase, for example, at least oneoligonucleotide of an initiating polynucleotide is adsorbed or otherwiseaffixed to a selected matrix or surface. This fixed, single-strandednucleic acid is then subjected to hybridization with the complementaryoligonucleotides under desired conditions. The selected conditions willalso depend on the particular circumstances based on the particularcriteria required (depending, for example, on the G+C content, type oftarget nucleic acid, source of nucleic acid, size of hybridizationprobe, etc.). Following washing of the hybridized surface to removenon-specifically bound oligonucleotides, the hybridization may bedetected, or even quantified, by means of the label.

The method of the invention further provides a third polynucleotidepresent in the target polynucleotide which is contiguous with theinitiating sequence and provides a 5′ overhang, a 3′ overhang, or a 5′overhang and a 3′ overhang, where at least one overhang of the thirdpolynucleotide is complementary to at least one overhang of theinitiating polynucleotide which is not complementary to an overhang ofthe second polynucleotide.

The method further provides contacting the initiating polynucleotidewith the second polynucleotide and the third polynucleotide underconditions and for such time suitable for annealing, the contactingresulting in a contiguous double-stranded polynucleotide, resulting inthe bi-directional extension of the initiating polynucleotide. Theannealed polynucleotides are optionally contacted with a ligase underconditions suitable for ligation. The method discussed above isoptionally repeated to sequentially add double-stranded polynucleotidesto the extended initiating polynucleotide through repeated cycles ofannealing and ligation.

A target polynucleotide sequence can be designed de novo or derived froma “model polynucleotide sequence”. As used herein, a “modelpolynucleotide sequence” includes any nucleic acid sequence that encodesa model polypeptide sequence. A model polypeptide sequence provides abasis for designing a modified polynucleotide such that a targetpolynucleotide incorporating the desired modification is synthesized.

The present invention provides also provides methods that can be used tosynthesize, de novo, polynucleotides that encode sets of genes, eithernaturally occurring genes expressed from natural or artificial promoterconstructs or artificial genes derived from synthetic DNA sequences,which encode elements of biological systems that perform a specifiedfunction or attribution of an artificial organism as well as entiregenomes. In producing such systems and genomes, the present inventionprovides the synthesis of a replication-competent, double-strandedpolynucleotide, wherein the polynucleotide has an origin of replication,a first coding region and a first regulatory element directing theexpression of the first coding region. By replication competent, it ismeant that the polynucleotide is capable of directing its ownreplication. Thus, it is envisioned that the polynucleotide will possessall the cis-acting signals required to facilitate its own synthesis. Inthis respect, the polynucleotide will be similar to a plasmid or avirus, such that once placed within a cell, it is capable of replicationby a combination of the polynucleotide's and cellular functions.

A polynucleotide sequence defining a gene, genome, set of genes orprotein sequence can be designed in a computer-assisted manner(discussed below) and used to generate a set of parsed oligonucleotidescovering the plus (+) and minus (−) strand of the sequence. As usedherein, a “parsed” means a target polynucleotide sequence has beendelineated in a computer-assisted manner such that a series ofcontiguous oligonucleotide sequences are identified. The oligonucleotidesequences are individually synthesized and used in a method of theinvention to generate a target polynucleotide. The length of anoligonucleotide is quite variable. Preferably, oligonucleotides used inthe methods of the invention are between about 15 and 100 bases and morepreferably between about 20 and 50 bases. Specific lengths include, butare not limited to 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,64. 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81,82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99and 100 bases. Depending on the size, the overlap between theoligonucleotides having partial complementarity may be designed to bebetween 5 and 75 bases per oligonucleotide pair.

The oligonucleotides preferably are treated with polynucleotide kinase,for example, T4 polynucleotide kinase. The kinasing can be performedprior to, or after, mixing of the oligonucleotides set or after, butbefore annealing. After annealing, the oligonucleotides are treated withan enzyme having a ligating function. For example, a DNA ligasetypically will be employed for this function. However, topoisomerase,which does not require 5′ phosphorylation, is rapid and operates at roomtemperature, and may be used instead of ligase. For example, 50 basepair oligonucleotides overlapping by 25 bases can be synthesized by anoligonucleotide array synthesizer (OAS). A 5′ (+) strand set ofoligonucleotides is synthesized in one 96-well plate and the second 3′or (−) strand set is synthesized in a second 96-well microtiter plate.Synthesis can be carried out using phosphoramidite chemistry modified tominiaturize the reaction size and generate small reaction volumes andyields in the range of 2 to 5 nmole. Synthesis is done on controlledpore glass beads (CPGs), then the completed oligonucleotides aredeblocked, deprotected and removed from the beads. The oligonucleotidesare lyophilized, re-suspended in water and 5′ phosphorylated usingpolynucleotide kinase and ATP to enable ligation.

The set of arrayed oligonucleotide sequences in the plate can beassembled using a mixed pooling strategy. For example, systematicpooling of component oligonucleotides can be performed using a modifiedBeckman Biomek automated pipetting robot, or another automated labworkstation. The fragments can be combined with buffer and enzyme (Taq IDNA ligase or Egea Assemblase™, for example). Pooling can be performedin microwell plates. After each step of pooling, the temperature isramped to enable annealing and ligation, then additional pooling carriedout.

Target polynucleotide assembly involves forming a set of intermediates.A set of intermediates can include a plus strand oligonucleotideannealed to a minus strand oligonucleotide, as described above. Theannealed intermediate can be formed by providing a single plus strandoligonucleotide annealed to a single minus strand oligonucleotide.

Alternatively, two or more oligonucleotides may comprise the plus strandor the minus strand. For example, in order to construct a polynucleotide(e.g., an initiating polynucleotide) which can be used to assemble atarget polynucleotide of the invention, three or more oligonucleotidescan be annealed. Thus, a first plus strand oligonucleotide, a secondplus strand oligonucleotide contiguous with the first plus strandoligonucleotide, and a minus strand oligonucleotide having a firstcontiguous sequence which is at least partially complementary to thefirst plus strand oligonucleotide and second contiguous sequence whichis at least partially complementary to the second plus strandoligonucleotide can be annealed to form a partially double-strandedpolynucleotide. The polynucleotide can include a 5′ overhang, a 3′overhang, or a 5′ overhang and a 3′ overhang. The first plus strandoligonucleotide and second plus strand oligonucleotide are contiguoussequences such that they are ligatable. The minus strand oligonucleotideis partially complementary to both plus strand oligonucleotides and actsas a “bridge” or “stabilizer” sequence by annealing to botholigonucleotides. Subsequent polynucleotides comprised of more than twooligonucleotides annealed as previously described, can be used toassemble a target polynucleotide in a manner resulting in a contiguousdouble-stranded polynucleotide.

An example of using two or more plus strand oligonucleotides to assemblea polynucleotide is shown in FIG. 3. A triplex of three oligonucleotidesof about 50 bp each, which overlap by about 25 bp form a “nicked”intermediate. Two of these oligonucleotides provide a ligation substratejoined by ligase and the third oligonucleotide is a stabilizer thatbrings together two specific sequences by annealing resulting in theformation of a part of the final polynucleotide construct. Thisintermediate provides a substrate for DNA ligase which, through its nicksealing activity, joins the two 50-base pair oligonucleotides into asingle 100 base single-stranded polynucleotide.

Following initial pooling and formation of annealed products, theproducts are assembled into increasingly larger polynucleotides. Forexample, following triplex formation of oligonucleotides, sets oftriplexes are systematically joined, ligated, and assembled. Each stepcan be mediated by robotic pooling, ligation and thermal cycling toachieve annealing and denaturation. The final step joins assembledpieces into a complete sequence representing all of the fragments in thearray. Since the efficiency of yield at each step is less than 100%, themass amount of completed product in the final mixture may be very small.Optionally, additional specific oligonucleotide primers, usually 15 to20 bases and complementary to the extreme ends of the assembly, can beannealed and PCR amplification carried out, thereby amplifying andpurifying the final full-length product.

The methods of the invention provide several improvements over existingpolynucleotide synthesis technology. For example, synthesis can utilizemicrodispensing piezioelectric or microsolenoid nanodispensors allowingvery fast synthesis, much smaller reaction volumes and higher densityplates as synthesis vessels. The instrument will use up to 1536 wellplates giving a very high capacity. Additionally, controlled pooling canbe performed by a microfluidic manifold that will move individualoligonucleotides though microchannels and mix/ligate in a controlledway. This will obviate the need for robotic pipetting and increasesspeed and efficiency. Thus, an apparatus that accomplishes a method ofthe invention will have a greater capability for simultaneous reactionsgiving an overall larger capacity for gene length.

Once target polynucleotide have been synthesized using a method of thepresent invention, it may be necessary to screen the sequences foranalysis of function. Specifically contemplated by the present inventorare chip-based DNA technologies. Briefly, these techniques involvequantitative methods for analyzing large numbers of genes rapidly andaccurately. By tagging genes with oligonucleotides or using fixed probearrays, one can employ chip technology to segregate target molecules ashigh-density arrays and screen these molecules on the basis ofhybridization.

The use of combinatorial synthesis and high throughput screening assaysare well known to those of skill in the art. For example, U.S. Pat. Nos.5,807,754; 5,807,683; 5,804,563; 5,789,162; 5,783,384; 5,770,358;5,759,779; 5,747,334; 5,686,242; 5,198,346; 5,738,996; 5,733,743;5,714,320; and 5,663,046 (each specifically incorporated herein byreference) describe screening systems useful for determining theactivity of a target polypeptide. These patents teach various aspects ofthe methods and compositions involved in the assembly and activityanalyses of high-density arrays of different polysubunits(polynucleotides or polypeptides). As such it is contemplated that themethods and compositions described in the patents listed above may beuseful in assaying the activity profiles of the target polypeptides ofthe present invention.

In another embodiment, the invention provides a method of synthesizing atarget polynucleotide by providing a target polynucleotide sequence andidentifying at least one initiating polynucleotide sequence present inthe target polynucleotide sequence that includes at least one plusstrand oligonucleotide annealed to at least one minus strandoligonucleotide resulting in a double-stranded polynucleotide. Theinitiating polynucleotide is contacted under conditions suitable forprimer annealing with a first oligonucleotide having partialcomplementarity to the 3′ portion of the plus strand of the initiatingpolynucleotide, and a second oligonucleotide having partialcomplementarity to the 3′ portion of the minus strand of the initiatingpolynucleotide. Primer extension subsequently performed usingpolynucleotide synthesis from the 3′-hydroxyl of: 1) the plus strand ofthe initiating polynucleotide; 2) the annealed first oligonucleotide; 3)the minus strand of the initiating polynucleotide; and 4) the annealedsecond oligonucleotide. The synthesis results in the initiating sequencebeing extended bi-directionally thereby forming a nascent extendedinitiating polynucleotide. The extended initiating sequence can befurther extended by repeated cycles of annealing and primer extension.

As previously noted, oligonucleotides can be used as building blocks toassemble polynucleotides through annealing and ligation reactions.Alternatively, oligonucleotides can be used as primers to manufacturepolynucleotides through annealing and primer extension reactions. Theterm “primer” is used herein to refer to a binding element whichcomprises an oligonucleotide, whether occurring naturally as in apurified restriction digest or produced synthetically, which is capableof acting as a point of initiation of synthesis when placed underconditions in which synthesis of a primer extension product which iscomplementary to a nucleic acid strand is induced, i.e., in the presenceof appropriate nucleotides and an agent for polymerization such as a DNApolymerase in an appropriate buffer (“buffer” includes pH, ionicstrength, cofactors, etc.) and at a suitable temperature.

The primer is preferably single stranded for maximum efficiency inamplification, but may alternatively be double stranded. If doublestranded, the primer is first treated to separate its strands beforebeing used to prepare extension products. Preferably, the primer is anoligodeoxyribonucleotide. The primer must be sufficiently long to primethe synthesis of extension products in the presence of the agent forpolymerization. The exact lengths of the primers will depend on manyfactors, including temperature and source of primer and use of themethod. Primers having only short sequences capable of hybridization tothe target nucleotide sequence generally require lower temperatures toform sufficiently stable hybrid complexes with the template.

The primers herein are selected to be “substantially” complementary tothe different strands of each specific sequence to be amplified. Thismeans that the primers must be sufficiently complementary to hybridizewith their respective strands. Therefore, the primer sequence need notreflect the exact sequence of the template. Commonly, however, theprimers have exact complementarity except with respect to analyseseffected according to the method described in Nucleic Acids Research 17(7) 2503-2516 (1989) or a corresponding method employing linearamplification or an amplification technique other than the polymerasechain reaction.

The agent for primer extension of an oligonucleotide may be any compoundor system that will function to accomplish the synthesis of primerextension products, including enzymes. Suitable enzymes for this purposeinclude, for example, E. coli DNA Polymerase I, Klenow fragment of E.coli DNA polymerase I, T4 DNA polymerase, other available DNApolymerases, reverse transcriptase, and other enzymes, includingthermostable enzymes. The term “thermostable enzyme” as used hereinrefers to any enzyme that is stable to heat and is heat resistant andcatalyses (facilitates) combination of the nucleotides in the propermanner to form the primer extension products which are complementary toeach nucleic acid strand. Generally, the synthesis will be initiated atthe 3′ end of each primer and will proceed in the 5′ direction along thetemplate strand, until synthesis terminates. A preferred thermostableenzyme that may be employed in the process of the present invention isthat which can be extracted and purified from Thermus aquaticus. Such anenzyme has a molecular weight of about 86,000-90,000 daltons. Thermusaquaticus strain YT1 is available without restriction from the AmericanType Culture Collection, 12301 Parklawn Drive, Rockville, Md., U.S.A. asATCC 25,104.

Processes for amplifying a desired target polynucleotide are known andhave been described in the literature. K. Kleppe et al in J. Mol. Biol.,(1971), 56, 341-361 disclose a method for the amplification of a desiredDNA sequence. The method involves denaturation of a DNA duplex to formsingle strands. The denaturation step is carried out in the presence ofa sufficiently large excess of two nucleic acid primers that hybridizeto regions adjacent to the desired DNA sequence. Upon cooling twostructures are obtained each containing the full length of the templatestrand appropriately complexed with primer. DNA polymerase and asufficient amount of each required nucleoside triphosphate are addedwhereby two molecules of the original duplex are obtained. The abovecycle of denaturation, primer addition and extension are repeated untilthe appropriate number of copies of the desired target polynucleotide isobtained.

The present invention further provides a method for the expression andisolation of a target polypeptide encoded by a target polynucleotide.The method includes incorporating a target polynucleotide synthesized bya method of the invention into an expression vector; introducing theexpression vector of into a suitable host cell; culturing the host cellunder conditions and for such time as to promote the expression of thetarget polypeptide encoded by the target polynucleotide; and isolatingthe target polypeptide.

The invention can be used to modify certain functional, structural, orphylogenic features of a model polynucleotide encoding a modelpolypeptide resulting in an altered target polypeptide. An input ormodel polynucleotide sequence encoding a model polypeptide can beelectronically manipulated to determine a potential for an effect of anamino acid change (or variance) at a particular site or multiple sitesin the model polypeptide. Once identified, a novel target polynucleotidesequence is assembled by a method of the invention such that the targetpolynucleotide encodes a target polypeptide possessing a characteristicdifferent from that of the model polypeptide.

The methods of the invention may rely on the use of public sequence andstructure databases. These databases become more robust as more and moresequences and structures are added. Information regarding the amino acidsequence of a target polypeptide and the tertiary structure of thepolypeptide can be used to synthesize oligonucleotides that can beassembled into a target polynucleotide encoding a target polypeptide. Amodel polypeptide should have sufficient structural information toanalyze the amino acids involved in the function of the polypeptide. Thestructural information can be derived from x-ray crystallography, NMR,or some other technique for determining the structure of a protein atthe amino acid or atomic level. Once selected, the sequence andstructural information obtained from the model polypeptide can be usedto generate a plurality of polynucleotides encoding a plurality ofvariant amino acid sequences that comprise a target polypeptide. Thus, amodel polypeptide can be selected based on overall sequence similarityto the target protein or based on the presence of a portion havingsequence similarity to a portion of the target polypeptide.

A “polypeptide”, as used herein, is a polymer in which the monomers arealpha amino acids and are joined together through amide bonds. Aminoacids may be the L-optical isomer or the D-optical isomer. Polypeptidesare two or more amino acid monomers long and are often more than 20amino acid monomers long. Standard abbreviations for amino acids areused (e.g., P for proline). These abbreviations are included in Stryer,Biochemistry, Third Ed., 1988, which is incorporated herein by referencefor all purposes. With respect to polypeptides, “isolated” refers to apolypeptide that constitutes the major component in a mixture ofcomponents, e.g., 50% or more, 60% or more, 70% or more, 80% or more,90% or more, or 95% or more by weight. Isolated polypeptides typicallyare obtained by purification from an organism in which the polypeptidehas been produced, although chemical synthesis is also possible. Methodof polypeptide purification includes, for example, chromatography orimmunoaffinity techniques.

Polypeptides of the invention may be detected by sodium dodecyl sulphate(SDS)-polyacrylamide gel electrophoresis followed by CoomassieBlue-staining or Western blot analysis using monoclonal or polyclonalantibodies that have binding affinity for the polypeptide to bedetected.

A “chimeric polypeptide,” as used herein, is a polypeptide containingportions of amino acid sequence derived from two or more differentproteins, or two or more regions of the same protein that are notnormally contiguous.

A “ligand”, as used herein, is a molecule that is recognized by areceptor. Examples of ligands that can be investigated by this inventioninclude, but are not restricted to, agonists and antagonists for cellmembrane receptors, toxins and venoms, viral epitopes, hormones,opiates, steroids, peptides, enzyme substrates, cofactors, drugs,lectins, sugars, oligonucleotides, nucleic acids, oligosaccharides, andproteins.

A “receptor”, as used herein, is a molecule that has an affinity for aligand. Receptors may be naturally-occurring or manmade molecules. Theycan be employed in their unaltered state or as aggregates with otherspecies. Receptors may be attached, covalently or noncovalently, to abinding member, either directly or via a specific binding substance.Examples of receptors which can be employed by this invention include,but are not restricted to, antibodies, cell membrane receptors,monoclonal antibodies and antisera reactive with specific antigenicdeterminants, viruses, cells, drugs, polynucleotides, nucleic acids,peptides, cofactors, lectins, sugars, polysaccharides, cellularmembranes, and organelles. A “ligand receptor pair” is formed when twomolecules have combined through molecular recognition to form a complex.

Specific examples of polypeptides which can synthesized by thisinvention include but are not restricted to:

-   -   a) Microorganism receptors: Determination of ligands that bind        to microorganism receptors such as specific transport proteins        or enzymes essential to survival of microorganisms would be a        useful tool for discovering new classes of antibiotics. Of        particular value would be antibiotics against opportunistic        fungi, protozoa, and bacteria resistant to antibiotics in        current use.    -   b) Enzymes: For instance, a receptor can comprise a binding site        of an enzyme such as an enzyme responsible for cleaving a        neurotransmitter; determination of ligands for this type of        receptor to modulate the action of an enzyme that cleaves a        neurotransmitter is useful in developing drugs that can be used        in the treatment of disorders of neurotransmission.    -   c) Antibodies: For instance, the invention may be useful in        investigating a receptor that comprises a ligand-binding site on        an antibody molecule which combines with an epitope of an        antigen of interest; determining a sequence that mimics an        antigenic epitope may lead to the development of vaccines in        which the immunogen is based on one or more of such sequences or        lead to the development of related diagnostic agents or        compounds useful in therapeutic treatments such as for        autoimmune diseases (e.g., by blocking the binding of the “self”        antibodies).    -   d) Polynucleotides: Sequences of polynucleotides may be        synthesized to establish DNA or RNA binding sequences that act        as receptors for synthesized sequence.    -   e) Catalytic Polypeptides: Polymers, preferably antibodies,        which are capable of promoting a chemical reaction involving the        conversion of one or more reactants to one or more products.        Such polypeptides generally include a binding site specific for        at least one reactant or reaction intermediate and an active        functionality proximate to the binding site, which functionality        is capable of chemically modifying the bound reactant. Catalytic        polypeptides and others are described in, for example, PCT        Publication No. WO 90/05746, WO 90/05749, and WO 90/05785, which        are incorporated herein by reference for all purposes.    -   f) Hormone receptors: Identification of the ligands that bind        with high affinity to a receptor such as the receptors for        insulin and growth hormone is useful in the development of, for        example, an oral replacement of the daily injections which        diabetics must take to relieve the symptoms of diabetes or a        replacement for growth hormone. Other examples of hormone        receptors include the vasoconstrictive hormone receptors;        determination of ligands for these receptors may lead to the        development of drugs to control blood pressure.    -   g) Opiate receptors: Determination of ligands which bind to the        opiate receptors in the brain is useful in the development of        less-addictive replacements for morphine and related drugs.

In the context of a polypeptide, the term “structure” refers to thethree dimensional arrangement of atoms in the protein. “Function” refersto any measurable property of a protein. Examples of protein functioninclude, but are not limited to, catalysis, binding to other proteins,binding to non-protein molecules (e.g., drugs), and isomerizationbetween two or more structural forms. “Biologically relevant protein”refers to any protein playing a role in the life of an organism.

To identify significant structural motifs, the sequence of the modelpolypeptide is examined for matches to the entries in one or moredatabases of recognized domains, e.g., the PROSITE database domains(Bairoch, Nucl. Acids. Res. 24:217, 1997) or the pfam HMM database(Bateman et al., (2000) Nucl. Acids. Res. 28:263). The PROSITE databaseis a compilation of two types of sequence signatures-profiles, typicallyrepresenting whole protein domains, and patterns typically representingjust the most highly conserved functional or structural aspects ofprotein domains.

The methods of the invention can be used to generate polypeptidescontaining polymorphisms that have an effect on a catalytic activity ofa target polypeptide or a non-catalytic activity of the targetpolypeptide (e.g., structure, stability, binding to a second protein orpolypeptide chain, binding to a nucleic acid molecule, binding to asmall molecule, and binding to a macromolecule that is neither a proteinnor a nucleic acid). For example, the invention provides a means forassembling any polynucleotide sequence encoding a target polypeptidesuch that the encoded polypeptide can be expressed and screened for aparticular activity. By altering particular amino acids at specificpoints in the target polypeptide, the operating temperature, operatingpH, or any other characteristic of a polypeptide can be manipulatedresulting in a polypeptide with a unique activity. Thus, the methods ofthe invention can be used to identify amino acid substitutions that canbe made to engineer the structure or function of a polypeptide ofinterest (e.g., to increase or decrease a selected activity or to add orremove a selective activity).

In addition, the methods of the invention can be used in theidentification and analysis of candidate polymorphisms forpolymorphism-specific targeting by pharmaceutical or diagnostic agents,for the identification- and analysis of candidate polymorphisms forpharmacogenomic applications, and for experimental biochemical andstructural analysis of pharmaceutical targets that exhibit amino acidpolymorphism.

A library of target polynucleotides encoding a plurality of targetpolypeptides can be prepared by the present invention. Host cells aretransformed by artificial introduction of the vectors containing thetarget polynucleotide by inoculation under conditions conducive for suchtransformation. The resultant libraries of transformed clones are thenscreened for clones which display activity for the polypeptide ofinterest in a phenotypic assay for activity.

A target polynucleotide of the invention can be incorporated (i.e.,cloned) into an appropriate vector. For purposes of expression, thetarget sequences encoding a target polypeptide of the invention may beinserted into a recombinant expression vector. The term “recombinantexpression vector” refers to a plasmid, virus, or other vehicle known inthe art that has been manipulated by insertion or incorporation of thepolynucleotide sequence encoding a target polypeptide of the invention.The expression vector typically contains an origin of replication, apromoter, as well as specific genes that allow phenotypic selection ofthe transformed cells. Vectors suitable for use in the present inventioninclude, but are not limited to, the T7-based expression vector forexpression in bacteria (Rosenberg et al., Gene, 56:125, 1987), thepMSXND expression vector for expression in mammalian cells (Lee andNathans, J. Biol. Chem., 263:3521, 1988), baculovirus-derived vectorsfor expression in insect cells, cauliflower mosaic virus, CaMV, tobaccomosaic virus, TMV.

Depending on the vector utilized, any of a number of suitabletranscription and translation elements, including constitutive andinducible promoters, transcription enhancer elements, transcriptionterminators etc. may be used in the expression vector (see, e.g., Bitteret al., Methods in Enzymology, 153:516-544, 1987). These elements arewell known to one of skill in the art.

The term “operably linked” or “operably associated” refers to functionallinkage between the regulatory sequence and the polynucleotide sequenceregulated by the regulatory sequence. The operably linked regulatorysequence controls the expression of the product expressed by thepolynucleotide sequence. Alternatively, the functional linkage alsoincludes an enhancer element.

“Promoter” means a nucleic acid regulatory sequence sufficient to directtranscription. Also included in the invention are those promoterelements that are sufficient to render promoter-dependent polynucleotidesequence expression controllable for cell-type specific, tissuespecific, or inducible by external signals or agents; such elements maybe located in the 5′ or 3′ regions of the native gene, or in theintrons.

“Gene expression” or “polynucleotide sequence expression” means theprocess by which a nucleotide sequence undergoes successfultranscription and translation such that detectable levels of thedelivered nucleotide sequence are expressed in an amount and over a timeperiod so that a functional biological effect is achieved.

In yeast, a number of vectors containing constitutive or induciblepromoters may be used. (Current Protocols in Molecular Biology, Vol. 2,Ed. Ausubel et al., Greene Publish. Assoc. & Wiley Interscience, Ch. 13,1988; Grant et al., “Expression and Secretion Vectors for Yeast,” inMethods in Enzymology, Eds. Wu & Grossman, Acad. Press, N.Y., Vol. 153,pp. 516-544, 1987; Glover, DNA Cloning, Vol. II, IRL Press, Wash., D.C.,Ch. 3, 1986; “Bitter, Heterologous Gene Expression in Yeast,” Methods inEnzymology, Eds. Berger & Kimmel, Acad. Press, N.Y., Vol. 152, pp.673-684, 1987; and The Molecular Biology of the Yeast Saccharomyces,Eds. Strathern et al., Cold Spring Harbor Press, Vols. I and II, 1982).A constitutive yeast promoter, such as ADH or LEU2, or an induciblepromoter, such as GAL, may be used (“Cloning in Yeast,” Ch. 3, R.Rothstein In: DNA Cloning Vol.11, A Practical Approach, Ed. D M Glover,IRL Press, Wash., D.C., 1986). Alternatively, vectors may be used whichpromote integration of foreign DNA sequences into the yeast chromosome.

In certain embodiments, it may be desirable to include specializedregions known as telomeres at the end of a target polynucleotidesequence. Telomeres are repeated sequences found at chromosome ends andit has long been known that chromosomes with truncated ends areunstable, tend to fuse with other chromosomes and are otherwise lostduring cell division.

Some data suggest that telomeres interact with the nucleoprotein complexand the nuclear matrix. One putative role for telomeres includesstabilizing chromosomes and shielding the ends from degradative enzyme.

Another possible role for telomeres is in replication. According topresent doctrine, replication of DNA requires starts from short RNAprimers annealed to the T-end of the template. The result of thismechanism is an “end replication problem” in which the regioncorresponding to the RNA primer is not replicated. Over many celldivisions, this will result in the progressive truncation of thechromosome. It is thought that telomeres may provide a buffer againstthis effect, at least until they are themselves eliminated by thiseffect. A further structure that may be included in targetpolynucleotide is a centromere.

In certain embodiments of the invention, the delivery of a nucleic acidin a cell may be identified in vitro or in vivo by including a marker inthe expression construct. The marker would result in an identifiablechange to the transfected cell permitting easy identification ofexpression.

An expression vector of the invention can be used to transform a targetcell. By “transformation” is meant a genetic change induced in a cellfollowing incorporation of new DNA (i.e., DNA exogenous to the cell).Where the cell is a mammalian cell, the genetic change is generallyachieved by introduction of the DNA into the genome of the cell. By“transformed cell” is meant a cell into which (or into an ancestor ofwhich) has been introduced, by means of recombinant DNA techniques.Transformation of a host cell with recombinant DNA may be carried out byconventional techniques as are well known to those skilled in the art.Where the host is prokaryotic, such as E. coli, competent cells that arecapable of DNA uptake can be prepared from cells harvested afterexponential growth phase and subsequently treated by the CaCl₂ method byprocedures well known in the art. Alternatively, MgCl₂ or RbCl can beused. Transformation can also be performed after forming a protoplast ofthe host cell or by electroporation.

A target polypeptide of the invention can be produced in prokaryotes byexpression of nucleic acid encoding the polypeptide. These include, butare not limited to, microorganisms, such as bacteria transformed withrecombinant bacteriophage DNA, plasmid DNA, or cosmid DNA expressionvectors encoding a polypeptide of the invention. The constructs can beexpressed in E. coli in large scale for in vitro assays. Purificationfrom bacteria is simplified when the sequences include tags for one-steppurification by nickel-chelate chromatography. The construct can alsocontain a tag to simplify isolation of the polypeptide. For example, apolyhistidine tag of, e.g., six histidine residues, can be incorporatedat the amino terminal end, or carboxy terminal end, of the protein. Thepolyhistidine tag allows convenient isolation of the protein in a singlestep by nickel-chelate chromatography. The target polypeptide of theinvention can also be engineered to contain a cleavage site to aid inprotein recovery. Alternatively, the polypeptides of the invention canbe expressed directly in a desired host cell for assays in situ.

When the host is a eukaryote, such methods of transfection of DNA ascalcium phosphate co-precipitates, conventional mechanical procedures,such as microinjection, electroporation or biollistic techniques,insertion of a plasmid encased in liposomes, or virus vectors may beused. Eukaryotic cells can also be cotransfected with DNA sequencesencoding a polypeptide of the invention, and a second foreign DNAmolecule encoding a selectable phenotype, such as the herpes simplexthymidine kinase gene. Another method is to use a eukaryotic viralvector, such as simian virus 40 (SV40) or bovine papilloma virus, totransiently infect or transform eukaryotic cells and express theprotein. (Eukaryotic Viral Vectors, Cold Spring Harbor Laboratory,Gluzman ed., 1982). Preferably, a eukaryotic host is utilized as thehost cell, as described herein.

Eukaryotic systems, and preferably mammalian expression systems, allowfor proper post-translational modifications of expressed mammalianproteins to occur. Eukaryotic cells that possess the cellular machineryfor proper processing of the primary transcript, glycosylation,phosphorylation, and advantageously secretion of the gene product shouldbe used as host cells for the expression of the polypeptide of theinvention. Such host cell lines may include, but are not limited to,CHO, VERO, BHK, HeLa, COS, MDCK, Jurkat, HEK-293, and WI38.

For long-term, high-yield production of recombinant proteins, stableexpression is preferred. Rather than using expression vectors thatcontain viral origins of replication, host cells can be transformed withthe cDNA encoding a target polypeptide of the invention controlled byappropriate expression control elements (e.g., promoter, enhancer,sequences, transcription terminators, polyadenylation sites, etc.), anda selectable marker. The selectable marker in the recombinant plasmidconfers resistance to the selection and allows cells to stably integratethe plasmid into their chromosomes and grow to form foci that, in turn,can be cloned and expanded into cell lines. For example, following theintroduction of foreign DNA, engineered cells may be allowed to grow for1-2 days in an enriched media, and then are switched to a selectivemedia. A number of selection systems may be used, including, but notlimited to, the herpes simplex virus thymidine kinase (Wigler et al.,Cell, 11:223, 1977), hypoxanthine-guanine phosphoribosyltransferase(Szybalska & Szybalski, Proc. Natl. Acad. Sci. USA, 48:2026, 1962), andadenine phosphoribosyltransferase (Lowy et al., Cell, 22:817, 1980)genes can be employed in tk-, hgprt- or aprt-cells, respectively. Also,antimetabolite resistance can be used as the basis of selection fordhfr, which confers resistance to methotrexate (Wigler et al., Proc.Natl. Acad. Sci. USA, 77:3567, 1980; O'Hare et al., Proc. Natl. Acad.Sci. USA, 8:1527, 1981); gpt, which confers resistance to mycophenylicacid (Mulligan & Berg, Proc. Natl. Acad. Sci. USA, 78:2072, 1981; neo,which confers resistance to the aminoglycoside G-418 (Colberre-Garapinet al., J. Mol. Biol., 150:1, 1981); and hygro, which confers resistanceto hygromycin genes (Santerre et al., Gene, 30:147, 1984). Recently,additional selectable genes have been described, namely trpB, whichallows cells to utilize indole in place of tryptophan; hisD, whichallows cells to utilize histinol in place of histidine (Hartman &Mulligan, Proc. Natl. Acad. Sci. USA, 85:8047, 1988); and ODC (ornithinedecarboxylase), which confers resistance to the ornithine decarboxylaseinhibitor, 2-(difluoromethyl)-DL-ornithine, DFMO (McConlogue L., In:Current Communications in Molecular Biology, Cold Spring HarborLaboratory, ed., 1987).

Techniques for the isolation and purification of either microbially oreukaryotically expressed polypeptides of the invention may be by anyconventional means, such as, for example, preparative chromatographicseparations and immunological separations, such as those involving theuse of monoclonal or polyclonal antibodies or antigen.

A target polynucleotide, or expression construct containing a targetpolynucleotide, may be entrapped in a liposome. Liposomes are vesicularstructures characterized by a phospholipid bilayer membrane and an inneraqueous medium. Multilarnellar liposomes have multiple lipid layersseparated by aqueous medium and form spontaneously when phospholipidsare suspended in an excess of aqueous solution. The lipid componentsundergo self-rearrangement before the formation of closed structures andentrap water and dissolved solutes between the lipid bilayers. Theliposome may be complexed with a hernagglutinating virus (HVJ). This hasbeen shown to facilitate fusion with the cell membrane and promote cellentry of liposome-encapsulated DNA. In other embodiments, the liposomemay be complexed or employed in conjunction with nuclear non-histonechromosomal proteins (HMG-1). In yet further embodiments, the liposomemay be complexed or employed in conjunction with both HVJ and HMG-1. Inthat such expression constructs have been successfully employed intransfer and expression of nucleic acid in vitro and in vivo, then theyare applicable for the present invention. Where a bacterial promoter isemployed in the DNA construct, it also will be desirable to includewithin the liposome an appropriate bacterial polymerase.

The present invention describes methods for enabling the creation of atarget polynucleotide based upon information only, i.e., without therequirement for existing genes, DNA molecules or genomes. Generally,using computer software, it is possible to construct a virtualpolynucleotide in the computer. This polynucleotide consists of a stringof DNA bases, G, A, T or C, comprising for example an entire artificialpolynucleotide sequence in a linear string. Following construction of asequence, computer software is then used to parse the target sequencebreaking it down into a set of overlapping oligonucleotides of specifiedlength. This results in a set of shorter DNA sequences that overlap tocover the entire length of the target polynucleotide in overlappingsets.

Typically, a gene of 1000 bases pairs would be broken down into 20100-mers where 10 of these comprise one strand and 10 of these comprisethe other strand. They would be selected to overlap on each strand by 25to 50 base pairs.

The degeneracy of the genetic code permits substantial freedom in thechoice of codons for any particular amino acid sequence. Transgenicorganisms such as plants frequently prefer particular codons that,though they encode the same protein, may differ from the codons in theorganism from which the gene was derived. For example, U.S. Pat. No.5,380,831 to Adang et al. describes the creation of insect resistanttransgenic plants that express the Bacillus thuringiensis (Bt) toxingene. The Bt crystal protein, an insect toxin, is encoded by afull-length gene that is poorly expressed in transgenic plants. In orderto improve expression in plants, a synthetic gene encoding the proteincontaining codons preferred in plants was substituted for the naturalsequence. The invention disclosed therein comprised a chemicallysynthesized gene encoding an insecticidal protein which is frequentlyequivalent to a native insecticidal protein of Bt. The synthetic genewas designed to be expressed in plants at a level higher than a nativeBt gene.

In designing a target polynucleotide that encodes a particularpolypeptide, the hydropathic index of amino acids may be considered. Theimportance of the hydropathic amino acid index in conferring interactivebiologic function on a protein is generally understood in the art. Eachamino acid has been assigned a hydropathic index on the basis of theirhydrophobicity and charge characteristics, these are: Isoleucine (+4.5);valine (+4.2); leucine (+3.8); phenylalanine (+2.8); cysteine/cystine.(+2.5); methionine (+1.9); alanine (+1.8); glycine (−0.4); threonine(47); serine (−0.8); tryptophan (−0.9); tyrosine (−1.3); proline (−1.6);histidine (−3.2); glutamate (−3.5); glutamine (−3.5); aspartate (−3.5);asparagine (−3.5); lysine (−3.9); and arginine (45).

It is known in the art that certain amino acids may be substituted byother amino acids having a similar hydropathic index or score and stillresult in a protein with similar biological activity, i.e., still obtaina biological functionally equivalent protein. In making such changes,the substitution of amino acids whose hydropathic indices are within±2is preferred, those which are within±I are particularly preferred, andthose within +0.5 are even more particularly preferred.

It is also understood in the art that the substitution of like aminoacids can be made effectively on the basis of hydrophilicity. U.S. Pat.No. 4,554,101, incorporated herein by reference, states that thegreatest local average hydrophilicity of a protein, as governed by thehydrophilicity of its adjacent amino acids, correlates with a biologicalproperty of the protein.

As detailed in U.S. Pat. No. 4,554,101, the following hydrophilicityvalues have been assigned to amino acid residues: arginine (+3.0);lysine (+3.0); aspartate (+3.0±1); glutarnate (+3.0±1); serine (+0.3);asparagine (+0.2); glutamine (+0.2); glycine (0); threonine (44);proline (−0.5 ±1); alanine (45); histidine−0.5); cysteine (−1.0);methionine (−1.3); valine 1.5); leucine (−1.8); isoleucine (−1.8);tyrosine (−2.3); phenylalanine (−2.5); tryptophan (−3.4).

It is understood that an amino acid can be substituted for anotherhaving a similar hydrophilicity value and still obtain a biologicallyequivalent and immunologically equivalent polypeptide. In such changes,the substitution of amino acids whose hydrophilicity values are within±2 is preferred, those that are within +1 are particularly preferred,and those within ±0.5 are even more particularly preferred.

As outlined above, amino acid substitutions are generally based on therelative similarity of the amino acid side-chain substituents, forexample, their hydrophobicity, hydrophilicity, charge, size, and thelike. Exemplary substitutions that take various of the foregoingcharacteristics into consideration are well known to those of skill inthe art and include: arginine and lysine; glutarnate and aspartate;serine and threonine; glutamine and asparagine; and valine, leucine andisoleucine.

Aspects of the invention may be implemented in hardware or software, ora combination of both. However, preferably, the algorithms and processesof the invention are implemented in one or more computer programsexecuting on programmable computers each comprising at least oneprocessor, at least one data storage system (including volatile andnon-volatile memory and/or storage elements), at least one input device,and at least one output device. Program code is applied to input data toperform the functions described herein and generate output information.The output information is applied to one or more output devices, inknown fashion.

Each program may be implemented in any desired computer language(including machine, assembly, high level procedural, or object orientedprogramming languages) to communicate with a computer system. In anycase, the language may be a compiled or interpreted language.

Each such computer program is preferably stored on a storage medium ordevice (e.g., ROM, CD-ROM, tape, or magnetic diskette) readable by ageneral or special purpose programmable computer, for configuring andoperating the computer when the storage media or device is read by thecomputer to perform the procedures described herein. The inventivesystem may also be considered to be implemented as a computer-readablestorage medium, configured with a computer program, where the storagemedium so configured causes a computer to operate in a specific andpredefined manner to perform the functions described herein.

Thus, in another embodiment, the invention provides a computer program,stored on a computer-readable medium, for generating a targetpolynucleotide sequence. The computer program includes instructions forcausing a computer system to: 1) identify an initiating polynucleotidesequence contained in the target polynucleotide sequence; 2) parse thetarget polynucleotide sequence into multiply distinct, partiallycomplementary, oligonucleotides; and 3) control assembly of the targetpolynucleotide sequence by controlling the bi-directional extension ofthe initiating polynucleotide sequence by the sequential addition ofpartially complementary oligonucleotides resulting in a contiguousdouble-stranded polynucleotide. The computer program will contain analgorithm for parsing the sequence of the target polynucleotide bygenerating a set of oligonucleotides corresponding to a polypeptidesequence. The algorithm utilizes a polypeptide sequence to generate aDNA sequence using a specified codon table. The algorithm then generatesa set of parsed oligonucleotides corresponding to the (+) and (−)strands of the DNA sequence in the following manner:

1. The DNA sequence GENE[ ], an array of bases, is generated from theprotein sequence AA[ ], an array of amino acids, using a specified codontable. An example of the codon table for E. coli type II codons, islisted below.

-   -   a. parameters        -   i. N Length of protein in amino acid residues        -   ii. L=3N Length of gene in DNA bases        -   iii. Q Length of each component oligonucleotide        -   iv. X=Q/2 Length of overlap between oligonucleotides        -   v. W=3N/Q Number of oligonucleotides in the F set        -   vi. Z=3N/Q+1 Number of oligonucleotides in the R set        -   vii. F[1:W] set of (+) strand oligonucleotides        -   viii. R[L:Z] set of (−) strand oligonucleotides        -   ix. AA[1:N] array of amino acid residues        -   x. GENE[1:L] array of bases comprising the gene    -   b. Obtain or design a protein sequence AA[ ] consisting of a        list of amino acid residues.    -   c. Generate the DNA sequence, GENE[ ], from the protein        sequence, AA[ ]        -   i. For I=1 to N        -   ii. Translate AA[J] from codon table generating GENE[I:I+2]        -   iii. I=I+3        -   iv. J=J+1        -   v. Go to ii            2. Two sets of overlapping oligonucleotides are generated            from GENE[ ]; F[ ] covers the (+) strand and R[ ] is a            complementary, partially overlapping set covering the (−)            strand.    -   a. Generate the F[ ] set of oligos        -   i. For I=1 to W        -   ii. F[I]=GENE [I:I+Q−1]        -   iii. I=I+Q        -   iv. Go to ii    -   b. Generate the R set of oligos        -   i. J=W        -   ii. For I=1 to W        -   iii. R[I]=GENE [W:W−Q]        -   iv. J=J−Q        -   v. Go to iii    -   c. Result is two set of oligos F[ ] and R[ ] of Q length    -   d. Generate the final two finishing oligos        -   i. S[1]=GENE [Q/2:1]        -   ii. S[2]=GENE [L-Q/2:L]            Subsequently, oligonucleotide set assembly is established by            the following algorithm:

Two sets of oligonucleotides F[1:W] R[1:Z] S[1:2]

3. Step 1

-   -   a. For I=1 to W    -   b. Ligate F[I], F[I+l], R[I]; place in T[I]    -   c. Ligate F[I+2], R[I+l], R[I+2] T[I+l]    -   d. I=I+3    -   e. Go to b        4. Step 2    -   a. Do the following until only a single reaction remains        -   i. For I=1 to W/3        -   ii. Ligate T[I], T[I+l]        -   iii. I=I+2        -   iv. Go to ii

Codon Table (E. coli Class II Preferred Usage) CODON TABLE (E. coliClass II preferred usage) PHE TTC SER TCT TYR TAC CYS TGG TER TGA TRPTGG ILE ATC MET ATG THR ACC LEU CTG PRO CCG HIS CAC GLN CAG ARG CGT VALGTT ALA GCG ASN AAC LYS AAA ASP GAC GLU GAA GLY GGT

Algorithms of the invention useful for assembly of a targetpolynucleotide can further be described as Perl script as set forthbelow. ALGORITHM 1 provides a method for converting a protein sequenceinto a polynucleotide sequence using E. Coli codons: #$sequence is theprotein sequence in single letter amino acid code #$seqlen is the lengthof the protein sequence #$amino acid is the individual amino acid in thesequence #$codon is the individual DNA triplet codon in the Genesequence #$DNAsquence is the gene sequence in DNA bases #$baselen is thelength of the DNA sequence in bases $seqlen = length($sequence);$baselen = $seqlen * 3; for ($n = 0; $n <= $seqlen; $n++) { $aminoacid =substr($sequence,$n,1);

The following list provides the class II codon preference in Perl for E.coli if ($aminoacid eq “m”) {$codon = “ATG”;}   elsif ($aminoacid eq“f”) {$codon = “TTC”;}   elsif ($aminoacid eq “l”) {$codon = “CTG”;}  elsif ($aminoacid eq “s”) {$codon = “TCT”;}   elsif ($aminoacid eq“y”) {$codon = “TAC”;}   elsif ($aminoacid eq “c”) {$codon = “TGC”;}  elsif ($aminoacid eq “w”) {$codon = “TGG”;}   elsif ($aminoacid eq“i”) {$codon = “ATC”;}   elsif ($aminoacid eq “t”) {$codon = “ACC”;}  elsif ($aminoacid eq “p”) {$codon = “CCG”;}   elsif ($aminoacid eq“q”) {$codon = “CAG”;}   elsif ($aminoacid eq “r”) {$codon = “CGT”;}  elsif ($aminoacid eq “v”) {$codon = “GTT”;}   elsif ($aminoacid eq“a”) {$codon = “GCG”;}   elsif ($aminoacid eq “n”) {$codon = “AAC”;}  elsif ($aminoacid eq “k”) {$codon = “AAA”;}   elsif ($aminoacid eq“d”) {$codon = “GAC”;}   elsif ($aminoacid eq “e”) {$codon = “GAA”;}  elsif ($aminoacid eq “g”) {$codon = “GGT”;}   elsif ($aminoacid eq“h”) {$codon = “CAC”;}   else {$codon = “”}; $DNAsequence =$DNAsequence + $codon;

ALGORITHM 2 provides a method for parsing a polynucleotide sequence intocomponent forward and reverse oligonucleotides that can be reassembledinto a complete target polynucleotide encoding a target polypeptide:#$oligoname is the identifier name for the list and for each component#oligonucleotide #$OL is the length of each component oligonucleotide#$Overlap is the length of the overlap in bases between each forward andeach #reverse oligonucleotide #$sequence is the DNA sequence in bases#$seqlen is the length of the DNA sequence in bases #$bas is theindividual base in a sequence #$forseq is the sequence of a forwardoligonucleotide #$revseq is the sequence of a reverse oligonucleotide#$revcomp is the reverse complemented sequence of the gene #$oligonameF-[ ] is the list of parsed forward oligos #$oligonameR- [ ] is the listof parsed reverse oligos $Overlap = <STDIN>; $seqlen =length($sequence); #convert forward sequence to upper case if lower case$forseq = “”; for ($j = 0; $j <= seqlen−1; $j ++) { $bas =substr($sequence,$j,1); if ($bas eq “a”) {$cfor = “A”;}   elsif ($bas eq“t”) {$cfor = “T”;}   elsif ($bas eq “c”) {$cfor = “C”;}   elsif ($baseq “g”) {$cfor = “G”;}   elsif ($bas eq “A”) {$cfor = “A”;}   elsif($bas eq “T”) {$cfor = “T”;}   elsif ($bas eq “C”) {$cfor = “C”;}  elsif ($bas eq “G”) {$cfor = “G”;}   else {$cfor = “X”}; $forseq =$forseq.$cfor; print OUT “$j \n”; }

The reverse complement of the sequence generated above is identified by:$revcomp = “”; for ($i = $seqlen−1; $i >= 0; $i−−) { $base =substr($sequence,$i,1); if ($base eq “a”) {$comp = “T”;}   elsif ($baseeq “t”) {$comp = “A”;}   elsif ($base eq “g”) {$comp = “C”;}   elsif($base eq “c”) {$comp = “G”;}   elsif ($base eq “A”) {$comp = “T”;}  elsif ($base eq “T”) {$comp = “A”;}   elsif ($base eq “G”) {$comp =“C”;}   elsif ($base eq “C”) {$comp = “G”;}   else {$comp = “X”};$revcomp = $revcomp.$comp; } #now do the parsing #generate the forwardoligo list print OUT “Forward oligos\n”; print “Forward oligos\n”; $r =1; for ($i = 0; $i <= $seqlen −1; $i+=$OL) { $oligo =substr($sequence,$i,$OL); print OUT “$oligname F− $r  $oligo\n”; print“$oligname F− $r  $oligo\n”; $r = $r + 1; } #generate the forwardreverse list $r = 1; for ($i = $seqlen − $Overlap − $OL; $i >= 0;$i−=$OL) { print OUT “\n”; print “\n”; $oligo = substr($revcomp,$i,$OL);print OUT “$oligname R− $r  $oligo”; print “$oligname R− $r  $oligo”; $r= $r + 1; } #Rectify and print out the last reverse oligo consisting of½ from the beginning # of the reverse complement. $oligo =substr($revcomp,1,$Overlap); print OUT “$oligo\n”; print “$oligo\n”;

The invention further provides a computer-assisted method forsynthesizing a target polynucleotide encoding a target polypeptidederived from a model sequence using a programmed computer including aprocessor, an input device, and an output device, by inputting into theprogrammed computer, through the input device, data including at least aportion of the target polynucleotide sequence encoding a targetpolypeptide. Subsequently, the sequence of at least one initiatingpolynucleotide present in the target polynucleotide sequence isdetermined and a model for synthesizing the target polynucleotidesequence is derived. The model is based on the position of theinitiating sequence in the target polynucleotide sequence using overallsequence parameters necessary for expression of the target polypeptidein a biological system. The information is outputted to an output devicewhich provides the means for synthesizing and assembling to targetpolynucleotide.

It is understood that any apparatus suitable for polynucleotidesynthesis can be used in the present invention. Various non-limitingexamples of apparatus, components, assemblies and methods are describedbelow. For example, in one embodiment, it is contemplated that ananodispensing head with up to 16 valves can be used to depositsynthesis chemicals in assembly vessels (FIG. 4). Chemicals can becontrolled using a syringe pump from the reagent reservoir. Because ofthe speed and capability of the ink-jet dispensing system, synthesis canbe made very small and very rapid. Underlying the reaction chambers is aset of assembly vessels linked to microchannels that will move fluids bymicrofluidics. The configuration of the channels will pool pairs andtriplexes of oligonucleotides systematically using, for example, arobotic device. However, pooling can be accomplished using fluidics andwithout moving parts.

As shown in FIG. 5, oligonucleotide synthesis, oligonucleotide assemblyby pooling and annealing, and ligation can be done using microfluidicmixing, resulting in the same set of critical triplex intermediates thatserves as the substrate for annealing, ligation and oligonucleotidejoining. DNA ligase and other components can be placed in the bufferfluid moving through the instrument microchambers. Thus, synthesis andassembly can be carried out in a highly controlled way in the sameinstrument.

As shown in FIG. 6, the pooling manifold can be produced from non-porousplastic and designed to control sequential pooling of oligonucleotidessynthesized in arrays. Oligonucleotide parsing from a gene sequencedesigned in the computer can be programmed for synthesis where (+) and(−) strands are placed in alternating wells of the array. Followingsynthesis in this format, the 12 row sequences of the gene are directedinto the pooling manifold that systematically pools three wells intoreaction vessels forming the critical triplex structure. Followingtemperature cycling for annealing and ligation, four sets of triplexesare pooled into 2 sets of 6 oligonucleotide products, then 1 set of 12oligonucleotide products. Each row of the synthetic array is associatedwith a similar manifold resulting in the first stage of assembly of 8sets of assembled oligonucleotides representing 12 oligonucleotideseach. As shown in FIG. 7, the second manifold pooling stage iscontrolled by a single manifold that pools the 8 row assemblies into asingle complete assembly. Passage of the oligonucleotide componentsthrough the two manifold assemblies (the first 8 and the second single)results in the complete assembly of all 96 oligonucleotides from thearray. The assembly module (FIG. 8) of Genewriter™ can include acomplete set of 7 pooling manifolds produced using microfabrication in asingle plastic block that sits below the synthesis vessels. Variousconfigurations of the pooling manifold will allow assembly of 96,384 or1536 well arrays of parsed component oligonucleotides.

The initial configuration is designed for the assembly of 96oligonucleotides synthesized in a pre-defined array, composed of 48pairs of overlapping 50 mers. Passage through the assembly device in thepresence of DNA ligase and other appropriate buffer and chemicalcomponents, and with appropriate temperature controls on the device,will assembly these into a single 2400 base double stranded geneassembly (FIG. 9).

The basic pooling device design can be made of Plexiglas™ or other typeof co-polymer with microgrooves or microfluidic channels etched into thesurface and with a temperature control element such as a Peltier circuitunderlying the junction of the channels. This results in a microreactionvessel at the junction of two channels for 1) mixing of the two streams,2) controlled temperature maintenance or cycling a the site of thejunction and 3) expulsion of the ligated mixture from the exit channelinto the next set of pooling and ligation chambers.

As shown in FIG. 11, the assembly platform design can consist of 8synthesis microwell plates in a 96 well configuration, addressed with 16channels of microdispensing. Below each plate is: 1) an evacuationmanifold for removing synthesis components; and 2) an assembly manifoldbased on the schematic in FIG. 9 for assembling componentoligonucleotides from each 96-well array. FIG. 12 shows a highercapacity assembly format using 1536-well microplates and capable ofsynthesis of 1536 component oligonucleotides per plate. Below each plateis: 1) an evacuation manifold for removing synthesis components; and 2)an assembly manifold assembly for assembling 1536 componentoligonucleotides from each 1536-well array. Pooling and assemblystrategies can be based on the concepts used for 96-well plates.

An alternative assembly format includes using surface-boundoligonucleotide synthesis rather than soluble synthesis on CPG glassbeads (FIG. 13). In this configuration, oligonucleotides are synthesizedwith a hydrocarbon linker that allows attachment to a solid support.Following parsing of component sequences and synthesis, the synthesizedoligonucleotides are covalently attached to a solid support such thatthe stabilizer is attached and the two ligation substrates added to theoverlying solution. Ligation occurs as mediated by DNA ligase in thesolution and increasing temperature above the Tm removes the linkedoligonucleotides by thermal melting. As shown in FIG. 14 the systematicassembly on a solid support of a set of parsed componentoligonucleotides can be arranged in an array with the set of stabilizeroligonucletoide attached. The set of ligation substrate oligonucleotidesare placed in the solution and, systematic assembly is carried out inthe solid phase by sequential annealing, ligation and melting whichmoves the growing DNA molecules across the membrane surface.

FIG. 15 shows an additional alternative means for oligonucleotideassembly, by binding the component oligonucleotides to a set of metalelectrodes on a microelectronic chip, where each electrode can becontrolled independently with respect to current and voltage. The arraycontains the set of minus strand oligonucleotides. Placing a positivechange on the electrode will move by electrophoresis the componentligase substrate oligonucleotide onto the surface where annealing takesplace. The presence of DNA ligase mediates covalent joining or ligationof the components. The electrode is then turned off or a negative chargeis applied and the DNA molecule expulsed from the electrode. The nextarray element containing the next stabilizer oligonucleotide from theparsed set is turned on with a positive charge and a second annealing,joining and ligation with the next oligonucleotide in the set carriedout. Systematic and repetitive application of voltage control,annealing, ligation and denaturation will result in the movement of thegrowing chain across the surface as well as assembly of the componentsinto a complete DNA molecule.

The invention further provides methods for the automated synthesis oftarget polynucleotides. For example, a desired sequence can be orderedby any means of communication available to a user wishing to order sucha sequence. A “user”, as used herein, is any entity capable ofcommunicating a desired polynucleotide sequence to a server. Thesequence may be transmitted by any means of communication available tothe user and receivable by a server. The user can be provided with aunique designation such that the user can obtain information regardingthe synthesis of the polynucleotide during synthesis. Once obtained, thetransmitted target polynucleotide sequence can be synthesized by anymethod set forth in the present invention.

The invention further provides a method for automated synthesis of apolynucleotide, by providing a user with a mechanism for communicating amodel polynucleotide sequence and optionally providing the user with anopportunity to communicate at least one desired modification to themodel sequence. The invention envisions a user providing a modelsequence and a desired modification to that sequence which results inthe alteration of the model sequence. Any modification that alters theexpression, function or activity of a target polynucleotide or encodedtarget polypeptide can be communicated by the user such that a modifiedpolynucleotide or polypeptide is synthesized or expressed according to amethod of the invention. For example, a model polynucleotide encoding apolypeptide normally expressed in a eukaryotic system can be alteredsuch that the codons of the resulting target polynucleotide areconducive for expression of the polypeptide in a prokaryotic system. Inaddition, the user can indicate a desired modified activity of apolypeptide encoded by a model polynucleotide. Once provided, thealgorithms and methods of the present invention can be used tosynthesize a target polynucleotide encoding a target polypeptidebelieved to have the desired modified activity. The methods of theinvention can be further utilized to express the target polypeptide andto screen for the desired activity. It is understood that the methods ofthe invention provide a means for synthetic evolution whereby anyparameter of polynucleotide expression and/or polypeptide activity canbe altered as desired.

Once the transmitted model sequence and desired modification areprovided by the user, the data including at least a portion of the modelpolynucleotide sequence is inputted-into a programmed computer, throughan input device. Once inputted, the algorithms of the invention are usedto determine the sequence of the model polynucleotide sequencecontaining the desired modification and resulting in a targetpolynucleotide containing the modification. Subsequently, the processorand algorithms of the invention is used to identify at least oneinitiating polynucleotide sequence present in the polynucleotidesequence. A target polynucleotide (i.e., a modified modelpolynucleotide) is identified and synthesized.

EXAMPLES

Nucleic Acid Synthesis Design Protocol

For the purposes of assembling a synthetic nucleic acid sequenceencoding a target polypeptide, a model polypeptide sequence or nucleicacid sequence is obtained and analyzed using a suitable DNA analysispackage, such as, for example, MacVector or DNA Star. If the targetprotein will be expressed in a bacterial system, for example, the modelsequence can be converted to a sequence encoding a polypeptide utilizingE. coli preferred codons (i.e., Type I, Type II or Type II codonpreference). The present invention provides the conversion programsCodon I, Codon II or Codon III. A nucleic acid sequence of the inventioncan be designed to accommodate any codon preference of any prokaryoticor eucaryotic organism.

In addition to the above codon preferences, specific promoter, enhancer,replication or drug resistance sequences can be included in a syntheticnucleic acid sequence of the invention. The length of the constructioncan be adjusted by padding to give a round number of bases based onabout 25 to 100 bp synthesis. The synthesis of sequences of about 25 to100 bp in length can be manufactured and assembled using the arraysynthesizer system and may be used without further purification. Forexample, two 96-well plates containing 100-mers could give a 9600 bpconstruction of a target sequence.

Subsequent to the design of the oligonucleotides needed for assembly ofthe target sequence, the oligonucleotides are parsed using ParseOligo™,a proprietary computer program that optimizes nucleic acid sequenceassembly. Optional steps in sequence assembly include identifying andeliminating sequences that may give rise to hairpins, repeats or otherdifficult sequences. The parsed oligonucleotide list is transferred tothe Synthesizer driver software. The individual oligonucleotides arepasted into the wells and oligonucleotide synthesis is accomplished.

Assembly of Parsed oligonucleotides Using a Two-Step PCR Reaction:

Obtain arrayed sets of parsed overlapping oligonucleotides, 50 baseseach, with an overlap of about 25 base pairs (bp). The oligonucleotideconcentration is from 250 nM (250 μM/ml). 50 base oligos give T_(m)sfrom 75 to 85 degrees C., 6 to 10 od₂₆₀, 11 to 15 nanomoles, 150 to 300μg. Resuspend in 50 to 100 μl of H₂O to make 250 nM/ml. Combine equalamounts of each oligonucleotide to final concentration of 250 μM (250nM/ml). Add 1 μl of each to give 192 μl. Add 8 μl dH₂O to bring up to200 μl. Final concentration is 250 μM mixed oligos. Dilute 250-fold bytaking 10 μl of mixed oligos and add to 1 ml of water. ({fraction(1/100)}; 2.5 μM then take 1 μl of this and add to 24 μl 1×PCR mix. ThePCR reaction includes:

-   -   10 mM TRIS-HCl, pH 9.0    -   2.2 mM MgCl₂    -   50 mM KCl    -   0,2 mM each dNTP    -   0.1% Triton X-100        One U TaqI polymerase is added to the reaction. The reaction is        thermoycled under the following conditions    -   a. Assembly        -   i. 55 cycles of            -   1.94 degrees 30 s            -   2.52 degrees 30 s            -   3.72 degrees 30 s                Following assembly amplification, take 2.5 μl of this                assembly mix and add to 100 μl of PCR mix. (40×                dilution). Prepare outside primers by taking 1 μl of Fl                (forward primer) and 1 μl of R96 (reverse primer) at 250                μM (250 nm/ml—0.250 nmole/μl) and add to the 100 μl PCR                reaction. This gives a final concentration of 2.5 uM                each oligo. Add 1 U Taql polymerase and thermocycle                under the following conditions:

35 cycles (or original protocol 23 cycles) 94 degrees 30 s 50 degrees 30s 72 degrees 60 s

Extract with phenyl/chloroform. Precipitate with ethanol. Resuspend in10 μl of dH₂O and analyze on an agarose gel.

Assembly of Parsed Oligonucleotides Using Taql Ligation

Arrayed sets of parsed overlapping oligonucleotides of about 25 to 150bases in length each, with an overlap of about 12 to 75 base pairs (bp),are obtained. The oligonucleotide concentration is from 250 nM (250μM/ml). For example, 50 base oligos give T_(m)s from 75 to 85 degreesC., 6 to 10 od₂₆₀, 11 to 15 nanomoles, 150 to 300 μg. Resuspend in 50 to100 ml of H₂O to make 250 nM/ml.

Using a robotic workstation, equal amounts of forward and reverse oligosare combined pairwise. Take 10 μl of forward and 10 μl of reverse oligoand mix in a new 96-well v-bottom plate. This gives one array with setsof duplex oligonucleotides at 250μμ, according to pooling scheme Step 1in Table 1. Prepare an assembly plate by taking 2 μl of each oligomerpair and adding to a fresh plate containing 100 μl of ligation mix ineach well. This gives an effective concentration of 2.5 μM or 2.5 nM/ml.Transfer 20 μl of each well to a fresh microwell plate and add 1 μl ofT4 polynucleotide kinase and 1 μl of 1 mM ATP to each well. Eachreaction will have 50 pmoles of oligonucleotide and 1 nmole ATP.Incubate at 37 degrees C. for 30 minutes.

Initiate assembly according to Steps 2-7 of Table 1. Carry out poolingStep 2 mixing each successive well with the next. Add 1 μl of Taqlligase to each mixed well. Cycle once at 94 degrees for 30 sec; 52degrees for 30 s; then 72 degrees for 10 minutes.

Carry out step 3 (Table 1) of pooling scheme and cycle according to thetemperature scheme above. Carry out steps 4 and 5 of the pooling schemeand cycle according to the temperature scheme above. Carry out poolingscheme step 6 and take 10 μl of each mix into a fresh microwell. Carryout step 7 pooling scheme by pooling the remaining three wells. Reactionvolumes will be:

Initial plate has 20 ul per well. Step 2  20 ul + 20 ul = 40 ul Step 3 80 ul Step 4 160 ul Step 5 230 ul Step 6  10 ul + 10 ul = 20 ul Step 7 20 + 20 + 20 = 60 ul final reaction volume

A final PCR amplification was then performed by taking 2 ul of finalligation mix and add to 20 ul of PCR mix containing 10 mM TRIS-HCl, pH9.0, 2.2 mM MgCl₂, 50 mM KCl, 0.2 mM each dNTP and 0.1% Triton X-100

Prepare outside primers by taking 1 μl of F1 (forward primer) and 1 μlof R96 (reverse primer) at 250 μM (250 nm/ml-0.250 nmole/μl) and add tothe 100 μl PCR reaction giving a final concentration of 2.5 uM eacholigo. Add 1 U Taql polymerase and cycle for 35 cycles under thefollowing conditions: 94 degrees for 30 s; 50 degrees for 30 s; and 72degrees for 60 s. Extract the mixture with phenyl/chloroform.Precipitate with ethanol. Resuspend in 10 μl of dH₂O and analyze on anagarose gel. TABLE 1 Pooling scheme for ligation assembly. Ligationmethod - Well pooling scheme STEP FROM TO 1 All F All R 2 A1 A2 A3 A4 A5A6 A7 A8 A9 A10 A11 A12 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 C1 C2 C3C4 C5 C6 C7 C8 C9 C10 C11 C12 D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 E1E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11F12 G1 G2 G3 G4 G5 G6 G7 G8 G9 G10 G11 G12 H1 H2 H3 H4 H5 H6 H7 H8 H9H10 H11 H12 3 A2 A4 A6 A8 A10 A12 B2 B4 B6 B8 B10 B12 C2 C4 C6 C8 C10C12 D2 D4 D6 D8 D10 D12 E2 E4 E6 E8 E10 E12 F2 F4 F6 F8 F10 F12 G2 G4 G6G8 G10 G12 H2 H4 H6 H8 H10 H12 4 A4 A8 A12 B4 B8 B12 C4 C8 C12 D4 D8 D12E4 E8 E12 F4 F8 F12 G4 G8 G12 H4 H8 H12 5 A8 B4 B12 C8 D4 D12 E8 F4 F12G8 H4 H12 6 B4 C8 D12 F4 G8 H12 7 C8 F4 H12Assembly of Parsed Oligonucleotides Using Taq I Synthesis and Assembly

Arrayed sets of parsed overlapping oligonucleotides of about 25 to 150bases in length each, with an overlap of about 12 to 75 base pairs (bp),are obtained. The oligonucleotide concentration is from 250 nM (250μM/ml). 50 base oligos give T_(m)s from 75 to 85 degrees C., 6 to 10od₂₆₀, 11 to 15 nanomoles, 150 to 300 μg. Resuspend in 50 to 100 ml ofH₂O to make 250 nM/ml.

The invention envisions using a robotic workstation to accomplishnucleic acid assembly. In the present example, two working platescontaining forward and reverse oligonucleotides in a PCR mix at 2.5 mMare prepared and 1 μl of each oligo are added to 100 μl of PCR mix in afresh microwell providing one plate of forward and one of reverse oligosin an array. Cycling assembly is then initiated as follows according tothe pooling scheme outlined in Table 1. In the present example, 96cycles of assembly can be accomplished according to this scheme.

Remove 2 μl of well F-E1 to a fresh well; remove 2 μl of R-E1 to a freshwell; add 18 μl of 1×PCR mix; add 1 U of Taql polymerase;

Cycle once:

-   -   94 degrees 30 s    -   52 degrees 30 s    -   72 degrees 30 s        Subsequently, remove 2 μl of well F-E2 to the reaction vessel;        remove 2 μl of well R-D12 to the reaction vessel. Cycle once        according to the temperatures above. Repeat the pooling and        cycling according to the scheme outlined in Table 1 for about 96        cycles.

A PCR amplification is then performed by taking 2 □l of final reactionmix and adding it to 20 μl of a PCR mix comprising:

-   -   10 mM TRIS-HCl, pH 9.0    -   2.2 mM MgCl2    -   50 mM KCl    -   0.2 mM each dNTP    -   0.1% Triton X-100

Outside primers are prepared by taking 1 μl of F1 and 1 μl of R96 at 250mM (250 nm/ml-0.250 nmole/ml) and add to the 100 μl PCR reaction. Thisgives a final concentration of 2.5 μM each oligo. 1 U Taql polymerase issubsequently added and the reaction is cycled for about 23 to 35 cyclesunder the following conditions: 94 degrees 30 s 50 degrees 30 s 72degrees 60 sThe reaction is subsequently extracted with phenyl/chloroform,precipitated with ethanol and resuspend in 10 ml of dH2O for analysis onan agarose gel.

Equal amounts of forward and reverse oligos pairwise are added by taking10 μl of forward and 10 μl of reverse oligo and mix in a new 96-wellv-bottom plate. This provides one array with sets of duplexoligonucleotides at 250 mM, according to pooling scheme Step 1 inTable 1. An assembly plate was prepared by taking 2 μl of each oligomerpair and adding them to the plate containing 100 μl of ligation mix ineach well. This gives an effective concentration of 2.5 μM or 2.5 nM/ml.About 20 μl of each well is transferred to a fresh microwell plate inaddition to 1 μl of T4 polynucleotide kinase and 1 μl of 1 mM ATP. Eachreaction will have 50 pmoles of oligonucleotide and 1 nmole ATP.Incubate at 37 degrees for 30 minutes.

Nucleic acid assembly was initiated according to Steps 2-7 of Table 1.Step 2 pooling is carried out by mixing each well with the next well insuccession. 1 μl of Taql ligase to is added to each mixed well andcycled once as follows:

-   -   94 degrees 30 sec    -   52 degrees 30 s    -   72 degrees 10 minutes

Step 3 of pooling scheme is carried out and cycled according to thetemperature scheme above. Steps 4 and 5 of the pooling scheme arecarried out and cycled according to the temperature scheme above. Carryout pooling scheme step 6 and take 10 μl of each mix into a freshmicrowell. Step 7 pooling scheme is carried out by pooling the remainingthree wells. The reaction volumes will be (initial plate has 20 μl perwell): Step 2  20 μl + 20 μl = 40 μl Step 3  80 μl Step 4 160 μl Step 5230 μl Step 6  10 μl + 10 μl = 20 μl Step 7  20 + 20 + 20 = 60 μl finalreaction volumeA final PCR amplification is performed by taking 2 μl of the finalligation mix and adding it to 20 μl of PCR mix comprising:

-   -   10 mM TRIS-HCl, pH 9.0    -   2.2 mM MgCl2    -   50 mM KCl    -   0.2 mM each dNTP    -   0.1 Triton X-100

Outside primers are prepared by taking 1 μl of F1 and 1 μl of R96 at 250mM (250 nm/ml-0.250 nmole/ml) and adding them to the 100 μl PCR reactiongiving a final concentration of 2.5 uM for each oligo. Subsequentlly, 1U of Taql polymerase is added and cycled for about 23 to 35 cycles underthe following conditions: 94 degrees 30 s 50 degrees 30 s 72 degrees 60s

The product is extracted with phenyl/chloroform, precipitate withethanol, resuspend in 10 μl of dH₂O and analyzed on an agarose gel.TABLE 2 Pooling scheme for assembly using Taq1 polymerase (alsotopoisomerase II). Forward Reverse Step oligo oligo 1 F E 1 + R E 1Pause 2 F E 2 + R D 12 Pause 3 F E 3 + R D 11 Pause 4 F E 4 + R D 10Pause 5 F E 5 + R D 9 Pause 6 F E 6 + R D 8 Pause 7 F E 7 + R D 7 Pause8 F E 8 + R D 6 Pause 9 F E 9 + R D 5 Pause 10 F E 10 + R D 4 Pause 11 FE 11 + R D 3 Pause 12 F E 12 + R D 2 Pause 13 F F 1 + R D 1 Pause 14 F F2 + R C 12 Pause 15 F F 3 + R C 11 Pause 16 F F 4 + R C 10 Pause 17 F F5 + R C 9 Pause 18 F F 6 + R C 8 Pause 19 F F 7 + R C 7 Pause 20 F F 8 +R C 6 Pause 21 F F 9 + R C 5 Pause 22 F F 10 + R C 4 Pause 23 F F 11 + RC 3 Pause 24 F F 12 + R C 2 Pause 25 F G 1 + R C 1 Pause 26 F G 2 + R B12 Pause 27 F G 3 + R B 11 Pause 28 F G 4 + R B 10 Pause 29 F G 5 + R B9 Pause 30 F G 6 + R B 8 Pause 31 F G 7 + R B 7 Pause 32 F G 8 + R B 6Pause 33 F G 9 + R B 5 Pause 34 F G 10 + R B 4 Pause 35 F G 11 + R B 3Pause 36 F G 12 + R B 2 Pause 37 F H 1 + R B 1 Pause 38 F H 2 + R A 12Pause 39 F H 3 + R A 11 Pause 40 F H 4 + R A 10 Pause 41 F H 5 + R A 9Pause 42 F H 6 + R A 8 Pause 43 F H 7 + R A 7 Pause 44 F H 8 + R A 6Pause 45 F H 9 + R A 5 Pause 46 F H 10 + R A 4 Pause 47 F H 11 + R A 3Pause 48 F H 12 + R A 2 Pause

TABLE 3 Alternate pooling scheme (initiating assembly from the 5′ or 3′end) 1. F-A1 → R-A1 denature, anneal, polymerase extension 2. F-A2 →R-H12 denature, anneal, polymerase extension 3. F-A3 → R-H11 denature,anneal, polymerase extension 4. F-A4 → R-H10 denature, anneal,polymerase extension 5. F-A5 → R-H9 denature, anneal, polymeraseextension 6. F-A6 → R-H8 denature, anneal, polymerase extension 7. F-A7→ R-H7 denature, anneal, polymerase extension 8. F-A8 → R-H6 denature,anneal, polymerase extension 9. F-A9 → R-H5 denature, anneal, polymeraseextension 10. F-A10 → R-H4 denature, anneal, polymerase extension 11.F-A11 → R-H3 denature, anneal, polymerase extension 12. F-A12 → R-H2denature, anneal, polymerase extension 13. F-B1 → R-H1 denature, anneal,polymerase extension 14. F-B2 → R-G12 denature, anneal, polymeraseextension 15. F-B3 → R-G11 denature, anneal, polymerase extension 16.F-B4 → R-G10 denature, anneal, polymerase extension 17. F-B5 → R-G9denature, anneal, polymerase extension 18. F-B6 → R-G8 denature, anneal,polymerase extension 19. F-B7 → R-G7 denature, anneal, polymeraseextension 20. F-B8 → R-G6 denature, anneal, polymerase extension 21.F-B9 → R-G5 denature, anneal, polymerase extension 22. F-B10 → R-G4denature, anneal, polymerase extension 23. F-B11 → R-G3 denature,anneal, polymerase extension 24. F-B12 → R-G2 denature, anneal,polymerase extension 25. F-C1 → R-G1 denature, anneal, polymeraseextension 26. F-C2 → R-F12 denature, anneal, polymerase extension 27.F-C3 → R-F11 denature, anneal, polymerase extension 28. F-C4 → R-F10denature, anneal, polymerase extension 29. F-C5 → R-F9 denature, anneal,polymerase extension 30. F-C6 → R-F8 denature, anneal, polymeraseextension 31. F-C7 → R-F7 denature, anneal, polymerase extension 32.F-C8 → R-F6 denature, anneal, polymerase extension 33. F-C9 → R-F5denature, anneal, polymerase extension 34. F-C10 → R-F4 denature,anneal, polymerase extension 35. F-C11 → R-F3 denature, anneal,polymerase extension 36. F-C12 → R-F2 denature, anneal, polymeraseextension 37. F-D1 → R-F1 denature, anneal, polymerase extension 38.F-D2 → R-E12 denature, anneal, polymerase extension 39. F-D3 → R-E11denature, anneal, polymerase extension 40. F-D4 → R-E10 denature,anneal, polymerase extension 41. F-D5 → R-E9 denature, anneal,polymerase extension 42. F-D6 → R-E8 denature, anneal, polymeraseextension 43. F-D7 → R-E7 denature, anneal, polymerase extension 44.F-D8 → R-E6 denature, anneal, polymerase extension 45. F-D9 → R-E5denature, anneal, polymerase extension 46. F-D10 → R-E4 denature,anneal, polymerase extension 47. F-D11 → R-E3 denature, anneal,polymerase extension 48. F-D12 → R-E2 denature, anneal, polymeraseextension 49. F-E1 → R-E1 denature, anneal, polymerase extension 50.F-E2 → R-D12 denature, anneal, polymerase extension 51. F-E3 → R-D11denature, anneal, polymerase extension 52. F-E4 → R-D10 denature,anneal, polymerase extension 53. F-E5 → R-D9 denature, anneal,polymerase extension 54. F-E6 → R-D8 denature, anneal, polymeraseextension 55. F-E7 → R-D7 denature, anneal, polymerase extension 56.F-E8 → R-D6 denature, anneal, polymerase extension 57. F-E9 → R-D5denature, anneal, polymerase extension 58. F-E10 → R-D4 denature,anneal, polymerase extension 59. F-E11 → R-D3 denature, anneal,polymerase extension 60. F-E12 → R-D2 denature, anneal, polymeraseextension 61. F-F1 → R-D1 denature, anneal, polymerase extension 62.F-F2 → R-C12 denature, anneal, polymerase extension 63. F-F3 → R-C11denature, anneal, polymerase extension 64. F-F4 → R-C10 denature,anneal, polymerase extension 65. F-F5 → R-C9 denature, anneal,polymerase extension 66. F-F6 → R-C8 denature, anneal, polymeraseextension 67. F-F7 → R-C7 denature, anneal, polymerase extension 68.F-F8 → R-C6 denature, anneal, polymerase extension 69. F-F9 → R-C5denature, anneal, polymerase extension 70. F-F10 → R-C4 denature,anneal, polymerase extension 71. F-F11 → R-C3 denature, anneal,polymerase extension 72. F-F12 → R-C2 denature, anneal, polymeraseextension 73. F-G1 → R-C1 denature, anneal, polymerase extension 74.F-G2 → R-B12 denature, anneal, polymerase extension 75. F-G3 → R-B11denature, anneal, polymerase extension 76. F-G4 → R-B10 denature,anneal, polymerase extension 77. F-G5 → R-B9 denature, anneal,polymerase extension 78. F-G6 → R-B8 denature, anneal, polymeraseextension 79. F-G7 → R-B7 denature, anneal, polymerase extension 80.F-G8 → R-B6 denature, anneal, polymerase extension 81. F-G9 → R-B5denature, anneal, polymerase extension 82. F-G10 → R-B4 denature,anneal, polymerase extension 83. F-G11 → R-B3 denature, anneal,polymerase extension 84. F-G12 → R-B2 denature, anneal, polymeraseextension 85. F-H1 → R-B1 denature, anneal, polymerase extension 86.F-H2 → R-A12 denature, anneal, polymerase extension 87. F-H3 → R-A11denature, anneal, polymerase extension 88. F-H4 → R-A10 denature,anneal, polymerase extension 89. F-H5 → R-A9 denature, anneal,polymerase extension 90. F-H6 → R-A8 denature, anneal, polymeraseextension 91. F-H7 → R-A7 denature, anneal, polymerase extension 92.F-H8 → R-A6 denature, anneal, polymerase extension 93. F-H9 → R-A5denature, anneal, polymerase extension 94. F-H10 → R-A4 denature,anneal, polymerase extension 95. F-H11 → R-A3 denature, anneal,polymerase extension 96. F-H12 → R-A2 denature, anneal, polymeraseextensionAssembly of Nucleic Acid Molecules

The nucleic acid molecules listed in Table 4 have been produced usingthe methods described herein. The features and characteristics of eachnucleic acid molecule is also described in Table 4.

As described in Table 4, a synthetic plasmid of 4800 bp in length wasassembled. The plasmid comprises 192 oligonucleotides (two sets of 96overlapping 50 mers; 25 bp overlap). The plasmid is essentially pUCcontaining kanamycin resistance instead of ampicillin resistance. Thesynthetic plasmid also contains lux A and B genes from the Vibriofisheri bacterial luciferase gene. The SynPuc19 plasmid is 2700 bp inlength comprising a sequence essentially identical to pUC19 onlyshortened to precisely 2700 bp. Two sets of 96 50mers were used toassemble the plasmid. The Synlux4 pUC19 plasmid was shortened and luxAgene was added. 54 100-mer oligonucleotides comprising two sets of 27oligonucleotides were used to assemble the plasmid. The miniQE10 plasmidcomprising 2400 bp was assembled using 48 50 mer oligonucleotides.MiniQE10 is an expression plasmid containing a 6×His tag and bacterialpromoter for high-level polypeptide expression. MiniQE10 was assembledand synthesized using the Taql polymerase amplification method of theinvention. The microQE plasmid is a minimal plasmid containing only anampicillin gene, an origin of replication and a linker of pQE plasmids.MicroQE was assembled using either combinatoric ligation with 24 50-mersor with one tube PCR amplification. The SynFibl, SynFibB and SynFibGnucleic acid sequences are synthetic human fibrinogens manufacturedusing E. coli codons to optimize expression in a prokaryotic expressionsystem. TABLE 4 Synthetic nucleic acid molecules produced using themethods of the invention. Synthetic Plasmid 4800 192 50 circular F1-F96SynPUC/19 2700 192 50 circular F01-F96 SynLux/4 2700 54 100 circularF1-27 MiniQE10 2400 48 50 circular MicroQE 1200 24 50 circular MQEF-1,24 Synfib1 1850 75 50 linear SFAF1-37 pQE25 2400 96 25 circular F1-F48SynFibB 1500 60 59 50 mers linear FibbF1-30 1 25 mer SynFibG 1350 54 5350 mers linear FibgF1-27 1 25 mer

It is to be understood that while the invention has been described inconjunction with the detailed description thereof, the foregoingdescription is intended to illustrate and not limit the scope of theinvention, which is defined by the scope of the appended claims. Otheraspects, advantages, and modifications are within the scope of thefollowing claims.

1. A method of synthesizing a target polynucleotide comprising: a)providing a target polynucleotide sequence; b) identifying at least oneinitiating polynucleotide present in the target polynucleotide of a),wherein the initiating polynucleotide comprises at least one plus strandoligonucleotide annealed to at least one minus strand oligonucleotideresulting in a partially double-stranded polynucleotide comprised of a5′ overhang and a 3′ overhang; c) identifying a second polynucleotidepresent in the target polynucleotide of a), wherein the secondpolynucleotide is contiguous with the initiating polynucleotide andcomprises at least one plus strand oligonucleotide annealed to at leastone minus strand oligonucleotide resulting in a partiallydouble-stranded polynucleotide comprised of a 5′ overhang, a 3′overhang, or a 5′ overhang and a 3′ overhang, wherein at least oneoverhang of the second polynucleotide is complementary to at least oneoverhang of the initiating polynucleotide; d) identifying a thirdpolynucleotide present in the target polynucleotide of a), wherein thethird polynucleotide is contiguous with the initiating sequence andcomprises at least one plus strand oligonucleotide annealed to at leastone minus strand oligonucleotide resulting in a partiallydouble-stranded polynucleotide comprised of a 5′ overhang, a 3′overhang, or a 5′ overhang and a 3′ overhang, wherein at least oneoverhang of the third polynucleotide is complementary to at least oneoverhang of the initiating polynucleotide which is not complementary toan overhang of the second polynucleotide; e) contacting the initiatingpolynucleotide of b) with the second polynucleotide of c) and the thirdpolynucleotide of d) under conditions and for such time suitable forannealing, the contacting resulting in a contiguous double-strandedpolynucleotide, wherein the initiating sequence is extendedbi-directionally; f) in the absence of primer extension, optionallycontacting the mixture of e) with a ligase under conditions suitable forligation; and g) optionally repeating b) through f) to sequentially adddouble-stranded polynucleotides to the extended initiatingpolynucleotide through repeated cycles of annealing and ligation,whereby a target polynucleotide is synthesized.
 2. The method of claim1, wherein the target polynucleotide sequence encodes a targetpolypeptide.
 3. The method of claim 2, wherein the target polypeptide isa protein.
 4. The method of claim 3, wherein the protein is an enzyme.5. The method of claim 1, wherein the initiating polynucleotide sequenceis identified by a computer program.
 6. The method of claim 5, whereinthe computer program comprises the following algorithm:
 7. The method ofclaim 1, wherein the plus strand of the initiating, second or thirdpolynucleotide is about 15 to 1000 nucleotides in length.
 8. The methodof claim 1, wherein the plus strand of the initiating, second or thirdpolynucleotide is about 20 to 500 nucleotides in length.
 9. The methodof claim 1, wherein the plus strand of the initiating, second or thirdpolynucleotide is about 25 to 100 nucleotides in length.
 10. The methodof claim 1, wherein the minus strand of the initiating, second or thirdpolynucleotide is about 15 to 1000 nucleotides in length.
 11. The methodof claim 1, wherein the minus strand of the initiating, second or thirdpolynucleotide is about 20 to 500 nucleotides in length.
 12. The methodof claim 1, wherein the minus strand of the initiating, second or thirdpolynucleotide is about 25 to 100 nucleotides in length.
 13. The methodof claim 1, wherein the initiating polynucleotide is attached to a solidsupport.
 14. A method of synthesizing a target polynucleotidecomprising: a) providing a target polynucleotide sequence derived from amodel sequence; b) identifying at least one initiating polynucleotidesequence present in the target polynucleotide sequence of a), whereinthe initiating polynucleotide comprises: 1) a first plus strandoligonucleotide; 2) a second plus strand oligonucleotide contiguous withthe first plus strand oligonucleotide; and 3) a minus strandoligonucleotide comprising a first contiguous sequence that is at leastpartially complementary to the first plus strand oligonucleotide andsecond contiguous sequence which is at least partially complementary tothe second plus strand oligonucleotide; c) annealing the first plusstrand oligonucleotide and the second plus strand oligonucleotide to theminus strand oligonucleotide of b) resulting in a partiallydouble-stranded initiating polynucleotide comprised of a 5′ overhang anda 3′ overhang; d) identifying a second polynucleotide sequence presentin the target polynucleotide sequence of a), wherein the secondpolynucleotide sequence is contiguous with the initiating polynucleotidesequence and comprises: 1) a first plus strand oligonucleotide; 2) asecond plus strand oligonucleotide contiguous with the first plus strandoligonucleotide; and 3) a minus strand oligonucleotide comprising afirst contiguous sequence which is at least partially complementary tothe first plus strand oligonucleotide and second contiguous sequencewhich is at least partially complementary to the second plus strandoligonucleotide; e) annealing the first plus strand oligonucleotide andthe second plus strand oligonucleotide to the minus strandoligonucleotide of d) resulting in a partially double-stranded secondpolynucleotide, wherein at least one overhang of the secondpolynucleotide is complementary to at least one overhang of theinitiating polynucleotide; f) identifying a third polynucleotide presentin the target polynucleotide of a), wherein the third polynucleotide iscontiguous with the initiating sequence and comprises: 1) a first plusstrand oligonucleotide; 2) a second plus strand oligonucleotidecontiguous with the first plus strand oligonucleotide; and 3) a minusstrand oligonucleotide comprising a first contiguous sequence which isat least partially complementary to the first plus strandoligonucleotide and second contiguous sequence which is at leastpartially complementary to the second plus strand oligonucleotide; g)annealing the first plus strand oligonucleotide and the second plusstrand oligonucleotide to the minus strand oligonucleotide of f)resulting in a partially double-stranded second polynucleotide, whereinat least one overhang of the third polynucleotide is complementary to atleast one overhang of the initiating polynucleotide and notcomplementary to an overhang of the second polynucleotide; h) contactingthe initiating polynucleotide of c) with the second polynucleotide of e)and the third polynucleotide of g) under conditions and for such timesuitable for annealing, the contacting resulting in a contiguousdouble-stranded polynucleotide, wherein the initiating sequence isextended bi-directionally; i) in the absence of primer extension,optionally contacting the mixture of h) with a ligase under conditionssuitable for ligation; and j) optionally repeating b) through i) tosequentially add double-stranded polynucleotides to the extendedinitiating polynucleotide through repeated cycles of annealing andligation, whereby a target polynucleotide is synthesized.
 15. A methodfor synthesizing a target polynucleotide, comprising: a) providing atarget polynucleotide sequence; b) identifying at least one initiatingpolynucleotide present in the target polynucleotide of a), wherein theinitiating polynucleotide comprises at least one plus strandoligonucleotide annealed to at least one minus strand oligonucleotide;c) contacting the initiating polynucleotide under conditions suitablefor primer annealing with a first oligonucleotide having partialcomplementarity to the 3′ portion of the plus strand of the initiatingpolynucleotide, and a second oligonucleotide having partialcomplementarity to the 3′ portion of the minus strand of the initiatingpolynucleotide; d) catalyzing under conditions suitable for primerextension: 1) polynucleotide synthesis from the 3′-hydroxyl of the plusstrand of the initiating polynucleotide; 2) polynucleotide synthesisfrom the 3′-hydroxyl of the annealed first oligonucleotide; 3)polynucleotide synthesis from the 3′-hydroxyl of the minus strand of theinitiating polynucleotide; and 4) polynucleotide synthesis from the3′-hydroxyl of the annealed second oligonucleotide, wherein theinitiating sequence is extended bi-directionally thereby forming anascent extended initiating polynucleotide; e) contacting the extendedinitiating polynucleotide of d) under conditions suitable for primerannealing with a third oligonucleotide having partial complementarity tothe 3′ portion of the plus strand of the extended initiatingpolynucleotide, and a fourth oligonucleotide having partialcomplementarity to the 3′ portion of the minus strand of the extendedinitiating polynucleotide; f) catalyzing under conditions suitable forprimer extension: 1) polynucleotide synthesis from the 3′-hydroxyl ofthe plus strand of the extended initiating polynucleotide; 2)polynucleotide synthesis from the 3′-hydroxyl of the annealed thirdoligonucleotide; 3) polynucleotide synthesis from the 3′-hydroxyl of theminus strand of the extended initiating polynucleotide; and 4)polynucleotide synthesis from the 3′-hydroxyl of the annealed fourtholigonucleotide, wherein the extended initiating sequence is extendedbi-directionally thereby forming a nascent extended initiatingpolynucleotide; and g) optionally repeating e) through f) as desired,resulting in formation of the target polynucleotide sequence.
 16. Themethod of claim 15, wherein the target polynucleotide sequence encodes atarget polypeptide.
 17. The method of claim 16, wherein the targetpolypeptide is a protein.
 18. The method of claim 17, wherein theprotein is an enzyme.
 19. The method of claim 15, wherein the initiatingpolynucleotide is identified by an algorithm.
 20. A method ofsynthesizing a target polynucleotide comprising: a) providing a targetpolynucleotide sequence; b) identifying at least one initiatingpolynucleotide present in the target polynucleotide of a), wherein theinitiating polynucleotide comprises at least one plus strandoligonucleotide annealed to at least one minus strand oligonucleotideresulting in a partially double-stranded polynucleotide comprised of atleast a 5′ overhang or a 3′ overhang; c) identifying a secondpolynucleotide present in the target polynucleotide of a), wherein thesecond polynucleotide is contiguous with the initiating polynucleotideand comprises at least one plus strand oligonucleotide annealed to atleast one minus strand oligonucleotide resulting in a partiallydouble-stranded polynucleotide comprised of a 5′ overhang, a 3′overhang, or a 5′ overhang and a 3′ overhang, wherein at least oneoverhang of the second polynucleotide is complementary to the overhangof the initiating polynucleotide; d) contacting the initiatingpolynucleotide of b) with the second polynucleotide of c) underconditions and for such time suitable for annealing, the contactingresulting in a contiguous double-stranded polynucleotide, wherein theinitiating sequence is extended uni-directionally; e) in the absence ofprimer extension, optionally contacting the mixture of e) with a ligaseunder conditions suitable for ligation; and f) optionally repeating b)through e) to sequentially add double-stranded polynucleotides to theextended initiating polynucleotide through repeated cycles of annealingand ligation, whereby a target polynucleotide is synthesized.
 21. Themethod of claim 15, wherein the plus strand of the initiating, second orthird polynucleotide is about 15 to 1000 nucleotides in length.
 22. Themethod of claim 15, wherein the plus strand of the initiating, second orthird polynucleotide is about 20 to 500 nucleotides in length.
 23. Themethod of claim 15, wherein the plus strand of the initiating, second orthird polynucleotide is about 25 to 100 nucleotides in length.
 24. Themethod of claim 15, wherein the minus strand of the initiating, secondor third polynucleotide is about 15 to 1000 nucleotides in length. 25.The method of claim 15, wherein the minus strand of the initiating,second or third polynucleotide is about 20 to 500 nucleotides in length.26. The method of claim 15, wherein the minus strand of the initiating,second or third polynucleotide is about 25 to 100 nucleotides in length.27. The method of claim 15, wherein the initiating polynucleotide isattached to a solid support.
 28. A method for isolating a targetpolypeptide encoded by a target polynucleotide, comprising: a) providinga target polynucleotide sequence derived from a model sequence; b)identifying at least one initiating polynucleotide present in the targetpolynucleotide of a), wherein the initiating polynucleotide comprises atleast one plus strand oligonucleotide annealed to at least one minusstrand oligonucleotide resulting in a partially double-strandedpolynucleotide comprised of a 5′ overhang and a 3′ overhang; c)identifying a second polynucleotide present in the target polynucleotideof a), wherein the second polynucleotide is contiguous with theinitiating sequence and comprises at least one plus strandoligonucleotide annealed to at least one minus strand oligonucleotideresulting in a partially double-stranded polynucleotide comprised of a5′ overhang, a 3′ overhang, or a 5′ overhang and a 3′ overhang, whereinat least one overhang of the second polynucleotide is complementary toat least one overhang of the initiating sequence; d) identifying a thirdpolynucleotide present in the target polynucleotide of a), wherein thethird polynucleotide is contiguous with the initiating sequence andcomprises at least one plus strand oligonucleotide annealed to at leastone minus strand oligonucleotide resulting in a partiallydouble-stranded polynucleotide comprised of a 5′ overhang, a 3′overhang, or a 5′ overhang and a 3′ overhang, wherein at least oneoverhang of the third polynucleotide is complementary to at least oneoverhang of the initiating sequence which is not complementary to anoverhang of the second polynucleotide; e) contacting the initiatingpolynucleotide of b) with the second polynucleotide of c) and the thirdpolynucleotide of d) under conditions and for such time suitable forannealing, the contacting resulting in a contiguous double-strandedpolynucleotide, wherein the initiating sequence is extendedbi-directionally; f) in the absence of primer extension, optionallycontacting the mixture of e) with a ligase under conditions suitable forligation; g) optionally repeating b) through f) to sequentially adddouble-stranded polynucleotides to the extended initiating sequencethrough repeated cycles of annealing and ligation, whereby a targetpolynucleotide is synthesized; h) incorporating the targetpolynucleotide of g) in an expression vector; i) introducing theexpression vector of h) into a suitable host cell; j) culturing the cellof i) under conditions and for such time as to promote the expression ofthe target polypeptide encoded by the target polynucleotide; and. k)isolating the target polypeptide.
 29. The method of claim 28, whereinthe target polypeptide is a chimeric protein.
 30. The method of claim28, wherein the target polypeptide is a fusion protein.
 31. The methodof claim 28, wherein the expression vector is a bacterial expressionvector.
 32. The method of claim 29, wherein the expression vector is ananimal cell expression vector.
 33. The method of claim 28, wherein theexpression vector is an insect cell expression vector.
 34. The method ofclaim 28, wherein the expression vector is a retroviral vector.
 35. Themethod of claim 29, wherein the expression vector is contained in a hostcell.
 36. The method of claim 35, wherein the host cell is a prokaryoticcell.
 37. The method of claim 35, wherein the host cell is a eukaryoticcell.
 38. The method of claims 1, 14, 15 or 27, wherein theoligonucleotides are produced by synthesis on a automated DNAsynthesizer.
 39. A method of synthesizing a target polynucleotidecomprising: a) providing a target polynucleotide sequence derived from amodel sequence; b) chemically synthesizing a plurality ofsingle-stranded oligonucleotides each of which is partiallycomplementary to at least one oligonucleotide present in the plurality,wherein the sequence of the plurality of oligonucleotides is acontiguous sequence of the target polynucleotide; c) contacting thepartially complementary oligonucleotides of b) under conditions and forsuch time suitable for annealing, the contacting resulting in aplurality of partially double-stranded polynucleotides, wherein eachdouble-stranded polynucleotide is comprised of a 5′ overhang and a 3′overhang; d) identifying at least one initiating polynucleotide derivedfrom the model sequence, wherein the initiating polynucleotide ispresent in the plurality of double-stranded polynucleotides set forth inc); e) in the absence of primer extension, subjecting a mixturecomprising the initiating polynucleotide and 1) a double-strandedpolynucleotide that will anneal to the 5′ portion of said initiating andsequence; 2) a double-stranded polynucleotide that will anneal to the 3′portion of the initiating polynucleotide; and 3) a DNA ligase underconditions suitable for annealing and ligation, wherein the initiatingpolynucleotide is extended bi-directionally; f) sequentially annealingdouble-stranded polynucleotides to the extended initiatingpolynucleotide through repeated cycles of annealing, whereby the targetpolynucleotide is produced.
 40. The method of claim 39, wherein theoligonucleotides are produced by synthesis on an automated DNAsynthesizer.
 41. A computer program, stored on a computer-readablemedium, for generating a target polynucleotide sequence, the computerprogram comprising instructions for causing a computer system to: a)identify an initiating polynucleotide sequence contained in the targetpolynucleotide sequence; b) parse the target polynucleotide sequenceinto multiply distinct, partially complementary, oligonucleotides; c)control assembly of the target polynucleotide sequence by controllingthe bi-directional extension of the initiating polynucleotide sequenceby the sequential addition of partially complementary oligonucleotidesresulting in a contiguous double-stranded polynucleotide.
 42. Thecomputer program of claim 41, wherein the parsing is performed by analgorithm.
 43. The computer program of claim 42, wherein the algorithmcomprises: $Overlap = <STDIN>; $seqlen = length($sequence); } $revcomp =“”; for ($i = $seqlen−1; $i >= 0; $i−−) { $base =substr($sequence,$i,1); if ($base eq “a”) {$comp = “T”;}   elsif ($baseeq “t”) {$comp = “A”;}   elsif ($base eq “g”) {$comp = “C”;}   elsif($base eq “c”) {$comp = “G”;}   elsif ($base eq “A”) {$comp = “T”;}  elsif ($base eq “T”) {$comp = “A”;}   elsif ($base eq “G”) {$comp =“C”;}   elsif ($base eq “C”) {$comp = “G”;}   else {$comp = “X”};$revcomp = $revcomp.$comp; } print OUT “Forward oligos\n”; print“Forward oligos\n”; $r = 1; for ($i = 0; $i <= $seqlen −1; $i+=$OL) {$oligo = substr($sequence,$i,$OL); print OUT “$oligname F− $r $oligo\n”; print “$oligname F− $r  $oligo\n”; $r = $r + 1; } $r = 1;for ($i = $seqlen − $Overlap − $OL; $i >= 0; $i−=$OL) { print OUT “\n”;print “\n”; $oligo = substr($revcomp,$i,$OL); print OUT “$oligname R− $r $oligo”; print “$oligname R− $r  $oligo”; $r = $r + 1; } $oligo =substr($revcomp,1,$Overlap); print OUT “$oligo\n”; print “$oligo\n”;

wherein $oligoname is the identifier name for the list and for eachcomponent #oligonucleotide; $OL is the length of each componentoligonucleotide; $Overlap is the length of the overlap in bases betweeneach forward and each #reverse oligonucleotide; $sequence is the DNAsequence in bases; $seqlen is the length of the DNA sequence in bases;$bas is the individual base in a sequence; $forseq is the sequence of aforward oligonucleotide; $revseq is the sequence of a reverseoligonucleotide; $revcomp is the reverse complemented sequence of thegene; $oligonameF-[ ] is the list of parsed forward oligos; and$oligonameR-[ ] is the list of parsed reverse oligos.
 44. The computerprogram of claim 43, wherein the forward sequence is optionallyconverted to upper case using an algorithm comprising: $forseq = “”; for($j = 0; $j <= seqlen−1; $j ++) { $bas = substr($sequence,$j,1); if($bas eq “a”) {$cfor = “A”;}   elsif ($bas eq “t”) {$cfor = “T”;}  elsif ($bas eq “c”) {$cfor = “C”;}   elsif ($bas eq “g”) {$cfor =“G”;}   elsif ($bas eq “A”) {$cfor = “A”;}   elsif ($bas eq “T”) {$cfor= “T”;}   elsif ($bas eq “C”) {$cfor = “C”;}   elsif ($bas eq “G”){$cfor = “G”;}   else {$cfor = “X”}; $forseq = $forseq.$cfor; print OUT“$j \n”;

wherein $seqlen is the length of the DNA sequence in bases $bas is theindividual base in a sequence $forseq is the sequence of a forwardoligonucleotide.
 45. A computer-assisted method for synthesizing atarget polynucleotide encoding a target polypeptide derived from a modelsequence using a programmed computer including a processor, an inputdevice, and an output device, comprising: a) inputting into theprogrammed computer, through the input device, data including at least aportion of the target polynucleotide sequence encoding a targetpolypeptide; b) determining, using the processor, the sequence of atleast one initiating polynucleotide present in the target polynucleotidesequence c) selecting, using the processor, a model for synthesizing thetarget polynucleotide sequence based on the position of the initiatingsequence in the target polynucleotide sequence using overall sequenceparameters necessary for expression of the target polypeptide in abiological system; and d) outputting, to the output device, the resultsof the at least one determination.
 46. The method of claim 45, furthercomprising predicting, using the processor, whether changing the modelsequence to the target polynucleotide will have an effect on the targetpolypeptide encoded by the target polynucleotide based on at least onephysical, structural or phylogenetic characteristic of the modelsequence.
 47. A method for automated synthesis of a targetpolynucleotide sequence, comprising: a) providing a user with anopportunity to communicate a desired target polynucleotide sequence; b)allowing the user to transmit the desired target polynucleotide sequenceto a server; c) providing the user with a unique designation; d)obtaining the transmitted target polynucleotide sequence provided by theuser.
 48. The method of claim 47, further comprising: f) identifying atleast one initiating polynucleotide present in the target polynucleotideof e), wherein the initiating polynucleotide comprises at least one plusstrand oligonucleotide annealed to at least one minus strandoligonucleotide resulting in a partially double-stranded polynucleotidecomprised of a 5′ overhang and a 3′ overhang; g) identifying a secondpolynucleotide present in the target polynucleotide of e), wherein thesecond polynucleotide is contiguous with the initiating polynucleotideand comprises at least one plus strand oligonucleotide annealed to atleast one minus strand oligonucleotide resulting in a partiallydouble-stranded polynucleotide comprised of a 5′ overhang, a 3′overhang, or a 5′ overhang and a 3′ overhang, wherein at least oneoverhang of the second polynucleotide is complementary to at least oneoverhang of the initiating polynucleotide; h) identifying a thirdpolynucleotide present in the target polynucleotide of e), wherein thethird polynucleotide is contiguous with the initiating sequence andcomprises at least one plus strand oligonucleotide annealed to at leastone minus strand oligonucleotide resulting in a partiallydouble-stranded polynucleotide comprised of a 5′ overhang, a 3′overhang, or a 5′ overhang and a 3′ overhang, wherein at least oneoverhang of the third polynucleotide is complementary to at least oneoverhang of the initiating polynucleotide which is not complementary toan overhang of the second polynucleotide; i) contacting the initiatingpolynucleotide of f) with the second polynucleotide of g) and the thirdpolynucleotide of h) under conditions and for such time suitable forannealing, the contacting resulting in a contiguous double-strandedpolynucleotide, wherein the initiating sequence is extendedbi-directionally; j) in the absence of primer extension, optionallycontacting the mixture of i) with a ligase under conditions suitable forligation; and k) optionally repeating f) through k) to sequentially adddouble-stranded polynucleotides to the extended initiatingpolynucleotide through repeated cycles of annealing and ligation,whereby a target polynucleotide is synthesized.
 49. The method of claim47, further comprising: f) identifying at least one initiatingpolynucleotide present in the target polynucleotide of e), wherein theinitiating polynucleotide comprises at least one plus strandoligonucleotide annealed to at least one minus strand oligonucleotide;g) contacting the initiating polynucleotide under conditions suitablefor primer annealing with a first oligonucleotide having partialcomplementarity to the 0.3° portion of the plus strand of the initiatingpolynucleotide, and a second oligonucleotide having partialcomplementarity to the 3′ portion of the minus strand of the initiatingpolynucleotide; h) catalyzing under conditions suitable for primerextension: 1) polynucleotide synthesis from the 3′-hydroxyl of the plusstrand of the initiating polynucleotide; 2) polynucleotide synthesisfrom the 3′-hydroxyl of the annealed first oligonucleotide; 3)polynucleotide synthesis from the 3′-hydroxyl of the minus strand of theinitiating polynucleotide; and 4) polynucleotide synthesis from the3′-hydroxyl of the annealed second oligonucleotide, wherein theinitiating sequence is extended bi-directionally thereby forming anascent extended initiating polynucleotide; i) contacting the extendedinitiating polynucleotide of h) under conditions suitable for primerannealing with a third oligonucleotide having partial complementarity tothe 3′ portion of the plus strand of the extended initiatingpolynucleotide, and a fourth oligonucleotide having partialcomplementarity to the 3′ portion of the minus strand of the extendedinitiating polynucleotide; j) catalyzing under conditions suitable forprimer extension: 1) polynucleotide synthesis from the 3′-hydroxyl ofthe plus strand of the extended initiating polynucleotide; 2)polynucleotide synthesis from the 3′-hydroxyl of the annealed thirdoligonucleotide; 3) polynucleotide synthesis from the 3′-hydroxyl of theminus strand of the extended initiating polynucleotide; and 4)polynucleotide synthesis from the 3′-hydroxyl of the annealed fourtholigonucleotide, wherein the extended initiating sequence is extendedbi-directionally thereby forming a nascent extended initiatingpolynucleotide; and k) optionally repeating f) through j) as desired,resulting in formation of the target polynucleotide sequence.
 50. Amethod for automated synthesis of a polynucleotide, comprising: a)providing a user with a mechanism for communicating a modelpolynucleotide sequence; b) optionally providing the user with anopportunity to communicate at least one desired modification to themodel sequence if desired; c) allowing the user to transmit the modelsequence and desired modification to a server; d) providing user with aunique designation; e) obtaining the transmitted model sequence anddesired modification provided by the user; f) inputting into aprogrammed computer, through an input device, data including at least aportion of the model polynucleotide sequence; g) determining, using theprocessor, the sequence of the model polynucleotide sequence containingthe desired modification; h) further determining, using the processor,at least one initiating polynucleotide sequence present in the modelpolynucleotide sequence i) selecting, using the processor, a model forsynthesizing the modified model polynucleotide sequence based on theposition of the initiating sequence in the model polynucleotidesequence; and j) outputting, to the output device, the results of the atleast one determination.
 51. An isolated polynucleotide compositioncomprising: a) an initiating polynucleotide comprising a plus strand anda minus strand, wherein the plus or minus strand is modified toincorporate a moiety that binds to a solid support; b) a first primersuitable for primer extension having partial complementarity to the 3′portion of the plus strand of the initiating polynucleotide c) a secondprimer suitable for primer extension having partial complementarity tothe 3′ portion of the minus strand of the initiating polynucleotide; andd) a solid support matrix, wherein each of the first and second primersconsists of about 25 to 1000 nucleotides.
 52. An isolated polynucleotidecomposition comprising: a) an initiating polynucleotide comprising aplus strand and a minus strand, wherein the plus or minus strand ismodified to incorporate a moiety that binds to a solid support; b) afirst primer suitable for primer extension having partialcomplementarity to the 3′ portion of the plus strand of the initiatingpolynucleotide c) a second primer suitable for primer extension havingpartial complementarity to the 3′ portion of the minus strand of theinitiating polynucleotide; and d) a solid support matrix, wherein eachof the first and second primers consists of about 25 to 1000nucleotides.